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Preface 


Objective of the Book 


The first edition of Basic Econometrics was published thirty years ago. Over the years, there have been 
important developments in the theory and practice of econometrics. In each of the subsequent editions, I have 
tried to incorporate the major developments in the field. The fifth edition continues that tradition. 

What has not changed, however, over all these years is my firm belief that econometrics can be taught to 
the beginner in an intuitive and informative way without resorting to matrix algebra, calculus, or statistics 
beyond the introductory level. Some subject material is inherently technical. In that case I have put the 
material in the appropriate appendix or refer the reader to the appropriate sources. Even then, I have tried to 
simplify the technical material so that the reader can get an intuitive understanding of this material. 

I am pleasantly surprised not only by the longevity of this book but also by the fact that the book is widely 
used not only by students of economics and finance but also by students and researchers in the fields of 
politics, international relations, agriculture, and health sciences. All these students will find the new edition 
with its expanded topics and concrete applications very useful. In this edition I have paid even more attention 
to the relevance and timeliness of the real data used in the text. In fact, I have added about fifteen new illus- 
trative examples and more than thirty new end-of-chapter exercises. Also, I have updated the data for about 
two dozen of the previous edition’s examples and more than twenty exercises. 

Although I am in the eighth decade of my life, I have not lost my love for econometrics, and I strive to keep 
up with the major developments in the field. To assist me in this endeavor, I am now happy to have Dr. Dawn 
Porter, Assistant Professor of Statistics at the Marshall School of Business at the University of Southern 
California in Los Angeles, as my co-author. Both of us have been deeply involved in bringing the fifth edition 
of Basic Econometrics to fruition. 

The Indian edition of the book has been done by Dr. Sangeetha Gunasekar, who has added several Indian 
examples to the chapters. This has been done with the idea of helping the Indian students better understand 
and relate to the subject with data sets. 


Major Features of the Fifth Edition 


Before discussing the specific changes in the various chapters, the following features of the new edition are 
worth noting: 


1. Practically all of the data used in the illustrative examples have been updated. 
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_ Several new examples have been added, especially in the Indian context. 
_ In several chapters, we have included extended concluding examples that illustrate the various points 


made in the text. 


. Concrete computer printouts of several examples are included in the book. Most of these results are 


based on EViews (version 6) and STATA (version 10), as well as MINITAB (version 15). 


. Several new diagrams and graphs are included in various chapters. 
. Several new data-based exercises are included in the various chapters. 
. Small-sized data are included in the book, but large sample data are posted on the book’s website, 


thereby minimizing the size of the text. The website will also publish all of the data used in the book 
and will be periodically updated. 


. In a few chapters, we have included class exercises in which students are encouraged to obtain their 


own data and implement the various techniques discussed in the book. Some Monte Carlo simulations 
are also included in the book. 


. Multiple choice questions have been included at the end of all chapters. 


Specific Changes to the Fifth Edition 


Some chapter-specific changes are as follows: 


1. 


The assumptions underlying the classical linear regression model (CLRM) introduced in Chapter 3 
now make a careful distinction between fixed regressors (explanatory variables) and random regressors. 
We discuss the importance of the distinction. 

The appendix to Chapter 6 discusses the properties of logarithms, the Box-Cox transformations, and 
various growth formulas. - 

Chapter 7 now discusses not only the marginal impact of a single regressor on the dependent variable 
but also the impacts of simultaneous changes of all the explanatory variables on the dependent variable. 
This chapter has also been reorganized in the same structure as the assumptions from Chapter 3. 

A comparison of the various tests of heteroscedasticity is given in Chapter 11. 

There is a new discussion of the impact of structural breaks on autocorrelation in Chapter 12. 

New topics included in Chapter 13 are missing data, non-normal error term, and stochastic, or random, 
regressors. 

A non-linear regression model discussed in Chapter 14 has a concrete application ðf the Box-Cox 
transformation. 


. Chapter 15 contains several new examples that illustrate the use of logit and probit models in various 


fields. 


Chapter 16 on panel data regression models has been thoroughly revised and illustrated with several 
applications. 


An extended discussion of Sims and Granger causality tests is now included in Chapter 17. 


. Stationary and non-stationary time series, as well as some of the problems associated with various tests 


of stationarity, are now thoroughly discussed in Chapter 21. 


. Chapter 22 includes a discussion on why taking the first differences of a time series for the purpose of 


making it stationary may not be the appropriate strategy in some situations. 


Besides these specific changes, errors and misprints in the previous editions have been corrected and the 
discussions of several topics in the various chapters have been streamlined. 
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Organization and Options 


The extensive coverage in this edition gives the instructor substantial flexibility in choosing topics that are 
appropriate to the intended audience. Here are suggestions about how this book may be used. 


One-semester course for the nonspecialist: Appendix A, Chapters | through 9, an overview of Chapters 
10, 11, 12 (omitting all the proofs). 

One-semester course for economics majors: Appendix A, Chapters | through 13. 

Two-semester course for economics majors: Appendices A, B, C, Chapters | to 22. Chapters 14 and 16 
may be covered on an optional basis. Some of the technical appendices may be omitted. 

Graduate and postgraduate students and researchers: This book is a handy reference book on the 
major themes in econometrics. 


Supplements 


The book’s website offers the supplementary material for the instructors as well as the students. To facilitate 
classroom teaching, the instructors have access to solutions manual and digital image library containing all 
graphs and figures in the text. For the students, the website provides web links to various online sites which 
they can explore for further strengthening of the concepts learnt and also data sets referenced in the book. 
These informative resources can be accessed at www.mhhe.com/sie-gujarati5e 
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Introduction 


LIE What is Econometrics? 


Literally interpreted, econometrics means “economic measurement.” Although measurement is an important 
part of econometrics. the scope of econometrics is much broader, as can be seen from the following quota- 
tions: 

Econometrics, the result of a certain outlook on the role of economics, consists of the application of mathematical 


statistics to economic data to lend empirical support to the models constructed by mathematical economics and to 
obtain numerical results.! 


... econometrics may be defined as the quantitative analysis of actual economic phenomena based on the concurrent 
development of theory and observation, related by appropriate methods of inference. 


Econometrics may be defined as the social science in which the tools of economic theory, mathematics, and statis- 
tical inference are applied to the analysis of economic phenomena.’ 


Econometrics is concerned with the empirical determination of economic laws.* 


The art of the econometrician consists in finding the set of assumptions that are both sufficiently specific and suffi- 
ciently realistic to allow him to take the best possible advantage of the data available to him.° 


Econometricians ... are a positive help in trying to dispel the poor public image of economics (quantitative or 
otherwise) as a subject in which empty boxes are opened by assuming the existence of can-openers to reveal 
contents which any ten economists will interpret in 11 ways.° 


The method of aiaa research aims, essentially, at a conjunction of economic theory and actual measure- 
ments, using the theory and technique of statistical inference as a bridge pier.” 


1Gerhard Tintner, Methodology of Mathematical Economics and Econometrics, The University of Chicago Press, Chicago, 
1968, p. 74. 
2P. A. Samuelson, T. C. Koopmans, and J. R. N. Stone, “Report of the Evaluative Committee for Econometrica,” Econometrica, 


vol. 22, no. 2, April 1954, pp. 141 -146. 

3arthur S. Goldberger, Econometric Theory, John Wiley & Sons, New York, 1964, p. 1. 

4H. Theil, Principles of Econometrics, john Wiley & Sons, New York, 1971, p. ile 

5E. Malinvaud, Statistical Methods of Econometrics, Rand McNally, Chicago, 1966, p. 514. 

Adrian C. Darnell and J. Lynne Evans, The Limits of Econometrics, Edward Elgar Publishing, Hants, England, 1990, p. 54. 
77. Haavelmo, “The Probability Approach in Econometrics,” Supplement to Econometrica, vol. 12, 1944, preface p. iii. 
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I.2 Why a Separate Discipline? 


As the preceding definitions suggest, econometrics is an amalgam of economic theory. mathematical 
economics, economic statistics, and mathematical statistics. Yet the subject deserves to be studied in its own 
right for the following reasons. 

Economic theory makes statements or hypotheses that are mostly qualitative in nature. For example. 
microeconomic theory states that, other things remaining the same, a reduction in the price of a commodity is 
expected to increase the quantity demanded of that commodity. Thus, economic theory postulates a negative 
or inverse relationship between the price and quantity demanded of a commodity. But the theory itself does 
not provide any numerical measure of the relationship between the two; that is, it does not tell by how much 
the quantity will go up or down as a result of a certain change in the price of the commodity. It is the job of 
the econometrician to provide such numerical estimates. Stated differently, econometrics gives empirical 
content to most economic theory. 

The main concern of mathematical economics is to express economic theory in mathematical form 
(equations) without regard to measurability or empirical verification of the theory. Econometrics. as noted 
previously, is mainly interested in the empirical verification of economic theory. As we shall see. the econo- 
metrician often uses the mathematical equations proposed by the mathematical economist but puts these 
equations in such a form that they lend themselves to empirical testing. And this conversion of mathematical 
into econometric equations requires a great deal of ingenuity and practical skill. 

Economic statistics is mainly concerned with collecting, processing, and presenting economic data in the 
form of charts and tables. These are the jobs of the economic statistician. It is he or she who is primarily 
responsible for collecting data on gross national product (GNP), employment, unemployment, prices. and so 
on. The data thus collected constitute the raw data for econometric work. But the economic statistician does 
not go any further, not being concerned with using the collected data to test economic theories. Of course. 
one who does that becomes an econometrician. 

Although mathematical statistics provides many tools used in the trade, the econometrician often needs 
special methods in view of the unique nature of most economic data. namely, that the data are not generated 
as the result of a controlled experiment. The econometrician, like the meteorologist. generally depends on 
data that cannot be controlled directly. As Spanos correctly observes: 


In econometrics the modeler is often faced with observational as opposed to experimental data. This has two 
important implications for empirical modeling in econometrics. First, the modeler is required to master very 
different skills than those needed for analyzing experimental data. ... Second, the separation of thé data collector 
and the data analyst requires the modeler to familiarize himself/herself thoroughly with the nature and structure 
of data in question.® 


1.3 Methodology of Econometrics 


How do econometricians proceed in their analysis of an economic problem? That is, what is their method- 
ology? Although there are several schools of thought on econometric methodology, we present here the 
traditional or classical methodology, which still dominates empirical research in economics and other social 
and behavioral sciences.” 


8aris Spanos, Probability Theory and Statistical Inference: Econometric Modeling with Observational Data, Cambridge University 
Press, United Kingdom, 1999, p. 21. 


For an enlightening, if advanced, discussion on econometric methodology, see David F. Hendry, Dynamic Econometrics, 
Oxford University Press, New York, 1995. See also Aris Spanos, op. cit. 
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Broadly speaking, traditional econometric methodology proceeds along the following lines: 
Statement of theory or hypothesis. 

Specification of the mathematical model of the theory. 

Specification of the statistical, or econometric, model. 

Obtaining the data. 

Estimation of the parameters of the econometric model. 

Hypothesis testing. 

Forecasting or prediction. 

Using the model for control or policy purposes. 


E ENEA tae are 


To illustrate the preceding steps, let us consider the well-known Keynesian theory of consumption. 


l. Statement of Theory or Hypothesis 


Keynes stated: 


The fundamental psychological law ... is that men [women] are disposed, as a rule and on average. to increase their 
consumption as their income increases, but not as much as the increase in their income.’ 


In short, Keynes postulated that the marginal propensity to consume (MPC), the rate of change of 
consumption for a unit (say, a rupee) change in income, is greater than zero but less than 1. 


2. Specification of the Mathematical Model of Consumption 


Although Keynes postulated a positive relationship between consumption and income, he did not specify the 
precise form of the functional relationship between the two. For simplicity, a mathematical economist might 
suggest the following form of the Keynesian consumption function: 


Y = 6, + b2X 0< fo <1 (1.3.1) 


where Y = consumption expenditure and X = income, and where 6, and £5, known as the parameters of the 
model, are, respectively, the intercept and slope coefficients. 

The slope coefficient B, measures the MPC. Geometrically, Eq. I.3.1 is as shown in Figure I.1. This 
equation, which states that consumption is linearly related to income, is an example of a mathematical model 
of the relationship between consumption and income that is called the consumption function in economics. 
A model is simply a set of mathematical eq: ‘ons. If the model has only one equation, as in the preceding 
example, it is called a single-equation model, whereas if it has more than one equation, it is known as a 
multiple-equation model (the latter will be considered later in the book). 

In Eq. (1.3.1) the variable appearing on the left side of the equality sign is called the dependent variable 
and the variable(s) on the right side is called the independent, or explanatory, variable(s). Thus, in the 
Keynesian consumption function, Eq. (1.3.1). consumption (expenditure) is the dependent variable and 
income is the explanatory variable. 


3. Specification of the Econometric Model of Consumption 


The purely mathematical model of the consumption function given in Eq. (1.3.1) is of limited interest to the 
econometrician, for it assumes that there is an exact or deterministic relationship between consumption and 
income. But relationships between economic variables are generally inexact. Thus, if we were to obtain data 


10John Maynard Keynes, The General Theory of Employment, Interest and Money, Harcourt Brace Jovanovich, New York, 1936, 
p. 96. 
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Figure I.1 Keynesian consumption function. 


on consumption expenditure and disposable (i.e., after tax) income of a sample of, say, 500 Indian families 
and plot these data on a graph paper with consumption expenditure on the vertical axis and disposable income 
on the horizontal axis, we would not expect all 500 observations to lie exactly on the straight line of Eq. (1.3.1) 
because, in addition to income, other variables affect consumption expenditure. For example, size of family, 
ages of the members in the family, family religion, etc., are likely to exert some infiuence on consumption. 

To allow for the inexact relationships between economic variables, the econometrician would modify the 
deterministic consumption function in Eq. (1.3.1) as follows: 


Y=f,+—X+u (1.3.2) 


where u, known as the disturbance, or error, term, is a random (stochastic) variable that has well-defined 
probabilistic properties. The disturbance term u may well represent all those factors that affect consumption 
but are not taken into account explicitly. 

Equation 1.3.2 is an example of an econometric model. More technically, it is an example of a linear 
regression model, which is the major concern of this book. The econometric consumption function hypoth- 
esizes that the dependent variable Y (consumption) is linearly related to the explanatory variable X (income) 
but that the relationship between the two is not exact; it is subject to individual variation. 

The econometric model of the consumption function can be depicted as shown in Figure 1.2. 


4. Obtaining Data 


To estimate the econometric model given in Eq. (1.3.2), that is, to obtain the numerical values of Bı and B,, 
we need data. Although we will have more to say about the crucial importance of data for economic analysis 
in the next chapter, for now let us look at the data given in Table I.1, which relate to the Indian economy for 
the period 1950-51 to 2006-07. The Y variable in this table is the aggregate (for the economy as a whole) 
private final consumption expenditure (PFCE) and the X variable is gross domestic product (GDP), a measure 
of aggregate income, both measured in Rupee crore at 1999-2000 prices. Therefore, the data are in “real” 
terms; that is, they are measured in constant (1999-2000) prices. The data are plotted in Figure 1.3 (cf. Figure 
1.2). For the time being neglect the line drawn in the figure. 


Consumption expenditure 


Income 


Figure I.2 Econometric model of the Keynesian consumption function. 
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Table I.1 Data on Y (Personal Final Consumption Expenditure) and X (Gross Domestic Product), both in 1999-00 
prices measured in rupee crore. 
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YEAR PFCE(Y) GDP(X)| YEAR PFCE(Y) GDP (X) YEAR PFCE(Y) GDP (X) 
1950-51 201090 224786 | 1970-71 392262 474131 1990-91 821863 1083572 
1951-52 213872 230034 | 1971-72 399894 478918 1991-92 839593 1099072 
1952-53 222503 236562 | 1972-73 402573 477392 1992-93 861245 1158025 
1953-54 235879 250960 | 1973-74 412452 499120 1993-94 898682 1223816 
1954-55 243617 261615 | 1974-75 412141 504914 1994-95 942359 1302076 
1955-56 245946 268316 | 1975-76 435546 550379 1995-96 999729 1396974 
1956-57 256826 283589 | 1976-77 444231 557258 1996-97 1077445 1508378 
1957-58 251753 280160 | 1977-78 480455 598885 1997-98 1109656 1573263 
1958-59 274864 301422 | 1978-79 509819 631839 1998-99 1181797 1678410 
1959-60 277991 308018 | 1979-80 498384 598974 1999-00 1253643 1786526 
1960-61 293804 329825 | 1980-81 543243 641921 2000-01 1292986 1864773 
1961-62 298813 340060 | 1981-82 566866 678033 2001-02 1367758 1972912 
1962-63 302706 347253 | 1982-83 572536 697861 2002-03 1397069 2047733 
1963-64 313966 364834 | 1983-84 616974 752669 2003-04 1493871 2222591 
1964-65 332722 392503 | 1984-85 634757 782484 2004-05 1579255 2389660 
1965-66 333017 378157 | 1985-86 661249 815049 2005-06 1689861 2604532 
1966-67 337344 382006 | 1986-87 682116 850217 2006-07 1800874 2871118 
1967-68 356429 413094 | 1987-88 705495 880267 
1968-69 365792 423874 | 1988-89 749530 969702 
1969-70 379378 451496 | 1989-90 786725 1029178 

Source: National Accounts Statistics (2000, 2007, 2009), Central Statistical Organization, Ministry of Statistics and Programme Implementation, 


Government of India (http://www.mospi.gov.in/) 
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5. Estimation of the Econometric Model 


Now that we have the data, our next task is to estimate the parameters of the consumption function. The 
numerical estimates of the parameters give empirical content to the consumption function. The actual 
mechanics of estimating the parameters will be discussed in Chapter 3. For now, note that the statistical 
technique of regression analysis is the main tool used to obtain the estimates. Using this technique and the 
data given in Table I.1, we obtain the following estimates of B, and B,, namely, 103736.0493 and 0.6303. 
Thus, the estimated consumption function is: 


Y, = 103736.0493 + 0.6303 X, (1.3.3) 


The hat on the Y indicates that it is an estimate.'! The estimated consumption function (i.e., regression line) 
is shown in Figure I.3. 

As Figure I.3 shows. the regression line fits the data quite well in that the data points are very close to the 
regression line. From this figure we see that for the period 1950-51 to 2007—08 the slope coefficient (i.e.. the 
MPC) was about 0.63, suggesting that for the sample period an increase in real income of one rupee led. on 
average, to an increase of about 63 paisa in real consumption expenditure.'” We say on average because the 
relationship between consumption and income is inexact; as is clear from Figure 1.3, not all the data points lie 
exactly on the regression line. In simple terms we can say that, according to our data, the average, or mean, 
consumption expenditure went up by about 63 paisa for a rupee’s increase in real income. 
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Figure I.3 Personal consumption expenditure (Y) in relation to GDP (A), 1950-51 to 2006-07, 1999-00 prices measured 
in Rupee Crore. 


6. Hypothesis Testing 


Assuming that the fitted model is a reasonably good approximation of reality, we have to develop suitable 
criteria to find out whether the estimates obtained in, say, Equation 1.3.3 are in accord with the expectations 
of the theory that is being tested. According to “positive” economists like Milton Friedman, a theory or 
hyipadaaais that is not verifiable by appeal to empirical evidence may not be admissible as a part of scientific 
enquiry. ~ 


"As a matter of convention, a hat over a variable or parameter indicates that it is an estimated value. 
12Do not worry now about how these values were obtained. As we show in Chapter 3, the statistical method of least 
squares has produced these estimates. Also, for now do not worry about the negative value of the intercept. 


Ei Milton Friedman, “The Methodology of Positive Economics,” Essays in Positive Economics, University of Chicago Press 
Chicago, 1953. ‘ 
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As noted earlier, Keynes expected the MPC to be positive but less than 1. In our example we found the 
MPC to be about 0.63. But before we accept this finding ws confirmation of Keynesian consumption theory, 
we must enquire whether this estimate is sufficiently below unity to convince us that this is not a chance 
occurrence or peculiarity of the particular data we have used. In other words, is 0.63 statistically less than 1? 
If it is, it may support Keynes’s theory. 

Such confirmation or refutation of economic theories on the basis of sample evidence is based on a branch 
of statistical theory known as statistical inference (hypothesis testing). Throughout this book we shall see 
how this inference process is actually conducted. 


7. Forecasting or Prediction 


If the chosen model does not refute the hypothesis or theory under consideration, we may use it to predict the 
future value(s) of the dependent. or forecast, variable Y on the basis of the known or expected future value(s) 
of the explanatory, or predictor, variable X. 

To illustrate. suppose we want to predict the mean consumption expenditure for 2007-2008. The GDP 
value for 2007-2008 was 31,29.717 crore rupees.'’ Putting this GDP figure on the right-hand side of 
Eq. (1.3.3), we obtain: A 

Yaooz-08 = 103736.0493 + 0.6303 (3129717) (1.3.4) 

= 2076396.6744 

or about 2076 thousand crore rupees. Thus, given the value of the GDP, the mean, or average, forecast 
consumption expenditure is about 2076 thousand crore rupees. The actual value of the consumption 
expenditure reported in 2007-08 was 1947 thousand crore rupees. The estimated model Eq. (1.3.3) thus 
overpredicted the actual consumption expenditure by about 130 thousand crore rupees. We could say the 
forecast error is about 130 thousand crore rupees. When we fully discuss the linear regression model in 
subsequent chapters. we will try to find out if such an error is “small” or “large.” But what is important for 
now is to note that such forecast errors are inevitable given the statistical nature of our analysis. 

There is another use of the estimated mode] Eq. (1.3.3). Suppose the president decides to propose a 
reduction in the income tax. What will be the effect of such a policy on income and thereby on consumption 
expenditure and ultimately on employment? 

Suppose that, as a result of the proposed policy change, investment expenditure increases, What will be 
the effect on the economy? As macroeconomic theory shows, the change in income following, say, a rupee’s 
worth of change in investment expenditure is given by the income multiplier M, which is defined as 

ue | 
~ 1—MPC 
If we use the MPC of 0.63 obtained in Eq. (1.3.3), this multiplier becomes about M = 2.70. That is, an increase 
(decrease) of a rupee in investment will eventually lead to more than a twofold increase (decrease) in income: 
note that it takes time for the multiplier to work. 

The critical value in this computation is MPC, for the multiplier depends on it. And this estimate of the 
MPC can be obtained from regression models such as Eq. (1.3.3). Thus, a quantitative estimate of MPC 
provides valuable information for policy purposes. Knowing MPC, one can predict the future course of 
income, consumption expenditure, and employment following a change in the government's fiscal policies. 


(1.3.5) 


14Data on PFCE and GDP were available for 2007-08 but we purposely left them out to illustrate the topic discussed in this 
section. As we will discuss in subsequent chapters, it is a good idea to save a portion of the data to find out how well the 
fitted model predicts the out-of-sample observations. 
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Economic theory 


Mathematical model of theory 
Econometric model of theory 


8. Use of the Model for Control or 
Policy Purposes 


Suppose we have the estimated consumption function given in Eq. 
(1.3.3). Suppose further the government believes that consumer expen- 
diture of about 2500 thousand crore rupees (in 1999-2000 prices) will 
help increase employment rate in the country. What level of income will 


guarantee the target amount of consumption expenditure? Data 
If the regression results given in Eq. (1.3.3) seem reasonable, simple 
arithmetic will show that | Estimation of econometric model | 
2500000 = 103736.0493 + 0.6303(GDP) (1.3.6) 


Hypothesis testing 


which gives X = 38,01,783, approximately. That is, an income level of 

about 3082 thousand crore rupees, given an MPC of about 0.63, will 

produce an expenditure of about 2500 thousand crore rupees. Forecasting or prediction 
As these calculations suggest, an estimated model may be used for 

control, or policy, purposes. By appropriate fiscal and monetary policy 

mix, the government can manipulate the control variable X to produce 


Using the model for 
a control or policy purposes 
the desired level of the target variable Y. 


Figure I.4 summarizes the anatomy of classical econometric modeling. Figure I.4 Anatomy of econo- 
metric modeling. 


Choosing among Competing Models 


When a governmental agency (e.g., the Central Statistical Organization) collects economic data, such as 
that shown in Table I.1, it does not necessarily have any economic theory in mind. How then does one know 
that the data really support the Keynesian theory of consumption? Is it because the Keynesian consumption 
function (i.e., the regression line) shown in Figure I.3 is extremely close to the actual data points? Is it 
possible that another consumption model (theory) might equally fit the data as well? For example, Milton 
Friedman has developed a model of consumption, called the permanent income hypothesis.'” Robert Hall has 
also developed a model of consumption, called the life-cycle permanent income hypothesis.'® Could one or 
both of these models also fit the data in Table I.1? 

In short, the question facing a researcher in practice is how to choose among competing hypotheses or 
models of a given phenomenon, such as the consumption—income relationship. As Miller contends: 


No encounter with data is [a] step towards genuine confirmation unless the hypothesis does a better job of coping 


with the data than some natural rival. . . . What strengthens a hypothesis, here, is a victory that is, at the same time, 
a defeat for a plausible rival.” 


How then does one choose among competing models or hypotheses? Here = advice given by Clive 
Granger is worth keeping in mind:!® 


15Milton Friedman, A Theory of Consumption Function, Princeton University Press, Princeton, N.J., 1957. 


‘SR. Hall, “Stochastic Implications of the Life Cycle Permanent Income Hypothesis: Theory and Evidence,” fournal of Political 
Economy, vol. 86, 1978, pp. 971-987. 


1R, W. Miller, Fact and Method: Explanation, Confirmation, and Reality in the Natural and Social Sciences, Princeton University 
Press, Princeton, N.J., 1978, p. 1 76. 


18Clive W. J. Granger, Empirical Modeling in Economics, Cambridge University Press, U.K., 1999, p. 58. 
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I would like to suggest that in the future, when you are presented with a new piece of theory or empirical model, 
you ask these questions: 
(i) What purpose does it have? What economic decisions does it help with? 


(ii) Is there any evidence being presented that allows me to evaluate its quality compared to alternative theories 
or models? 


I think attention to such questions will strengthen economic research and discussion. 


As we progress through this book, we will come across several competing hypotheses trying to explain 
various economic phenomena. For example, students of economics are familiar with the concept of the 
production function, which is basically a relationship between Output and inputs (say, capital and labor). 
In the literature, two of the best known are the Cobb-Douglas and the constant elasticity of substitution 
production functions. Given the data on output and inputs, we will have to find out which of the two production 
functions, if any, fits the data well. 

The eight-step classical econometric methodology discussed above is neutral in the sense that it can be 
used to test any of these rival hypotheses. 

Is it possible to develop a methodology that is comprehensive enough to include competing hypotheses? 
This is an involved and controversial topic. We will discuss it in Chapter 13, after we have acquired the 
necessary econometric theory. 


1.4 Types of Econometrics 


As the classificatory scheme in Figure I.5 suggests, econometrics may be divided into two broad categories: 
theoretical econometrics and applied econometrics. In each category, one can approach the subject in the 
classical or Bayesian tradition. In this book the emphasis is on the classical approach. For the Bayesian 
approach, the reader may consult the references given at the end of the chapter. 


Econometrics 
Theoretical Applied 
Classical Bayesian Classical Bayesian 


Figure I.5 Categories of econometrics. 


Theoretical econometrics is concerned with the development of appropriate methods for measuring 
economic relationships specified by econometric models. In this aspect, econometrics leans heavily on 
mathematical statistics. For example, one of the methods used extensively in this book is least squares. 
Theoretical econometrics must spell out the assumptions of this method, its properties, and what happens to 
these properties when one or more of the assumptions of the method are not fulfilled. 

In applied econometrics we use the tools of theoretical econometrics to study some special field(s) of 
economics and business, such as the production function, investment function, demand and supply functions, 
portfolio theory, etc. 

This book is concerned largely with the development of econometric methods, their assumptions, their 
uses, and their limitations. These methods are illustrated with examples from various areas of economics and 
business. But this is not a book of applied econometrics in the sense that it delves deeply into any particular 
field of economic application. That job is best left to books written specifically for this purpose. References 
to some of these books are provided at the end of this book. 
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1.5 Mathematical and Statistical Prerequisites 


Although this book is written at an elementary level, the author assumes that the reader is familiar with the 
basic concepts of statistical estimation and hypothesis testing. However, a broad but nontechnical overview of 
the basic statistical concepts used in this book is provided in Appendix A for the benefit of those who want 
to refresh their knowledge. Insofar as mathematics is concerned, a nodding acquaintance with the notions of 
differential calculus is desirable, although not essential. Although most graduate level books in econometrics 
make heavy use of matrix algebra, I want to make it clear that it is not needed to study this book. It is my 
strong belief that the fundamental ideas of econometrics can be conveyed without the use of matrix algebra. 
However, for the benefit of the mathematically inclined student, Appendix C gives the summary of basic 
regression theory in matrix notation. For these students, Appendix B provides a succinct summary of the 
main results from matrix algebra. 


1.6 The Role of the Computer 


Regression analysis, the bread-and-butter tool of econometrics, these days is unthinkable without the 
computer and some access to statistical software. (Believe me, I grew up in the generation of the slide rule!) 
Fortunately, several excellent regression packages are commercially available, both for the mainframe and 
the microcomputer, and the list is growing by the day. Regression software packages, such as ET, LIMDEP, 
SHAZAM, MICRO TSP, MINITAB, EVIEWS, SAS, SPSS, STATA, Microfit, PeGive, and BMD have 
most of the econometric techniques and tests discussed in this book. 

In this book, from time to time, the reader will be asked to conduct Monte Carlo experiments using one 
or more of the statistical packages. Monte Carlo experiments are “fun” exercises that will enable the reader to 
appreciate the properties of several statistical methods discussed in this book. The details of the Monte Carlo 
experiments will be discussed at appropriate places. 


1.7 Suggestions for Further Reading 


The topic of econometric methodology is vast and controversial. For those interested in this topic. I suggest 
the following books: 

Neil de Marchi and Christopher Gilbert, eds., History and Methodology of Econometrics, Oxford 
University Press, New York, 1989. This collection of readings discusses some early work on econometric 
methodology and has an extended discussion of the British approach to econometrics relating to time series 
data, that is, data collected over a period of time. 

Wojciech W. Charemza and Derek F. Deadman, New Directions in Econometric Practice: General to 
Specific Modelling, Cointegration and Vector Autogression, 2d ed., Edward Elgar Publishing Ltd., Hants, 
England, 1997. The authors of this book critique the traditional approach to econometrics and give a detailed 
exposition of new approaches to econometric methodology. 

Adrian C. Darnell and J. Lynne Evans, The Limits of Econometrics, Edward Elgar Publishing Ltd., Hants, 
England, 1990. The book provides a somewhat balanced discussion of the various methodological approaches 
to econometrics, with renewed allegiance to traditional econometric methodology. 

Mary S. Morgan, The History of Econometric Ideas, Cambridge University Press, New York, 1990. The 
author provides an excellent historical perspective on the theory and practice of econometrics, with an in-depth 
discussion of the early contributions of Haavelmo (1990 Nobel Laureate in Economics) to econometrics. In 
the same spirit, David F. Hendry and Mary S. Morgan, The Foundation of Econometric Analysis, Cambridge 
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University Press, U.K., 1995, have collected seminal writings in econometrics to show the evolution of 
econometric ideas over time. 

David Colander and Reuven Brenner, eds., Educating Economists, University of Michigan Press, Ann 
Arbor, Michigan, 1992. This text presents a critical, at times agnostic, view of economic teaching and practice. 

For Bayesian statistics and econometrics, the following books are very useful: John H. Dey, Data in 
Doubt, Basil Blackwell Ltd., Oxtord University Press, England, 1985; Peter M. Lee, Bayesian Statistics: 
An Introduction, Oxford University Press, England, 1989; and Dale J. Porier, Intermediate Statistics and 
Econometrics: A Comparative Approach, MIT Press, Cambridge, Massachusetts, 1995. Arnold Zeller, An 
Introduction to Bayesian Inference in Econometrics, John Wiley & Sons, New York, 1971, is an advanced 
reference book. Another advanced reference book is the Palgrave Handbook of Econometrics: Volume 1: 
Econometric Theory, edited by Terence C. Mills and Kerry Patterson, Palgrave Macmillan, New York, 2007. 


SINGLE-EQUATION 
REGRESSION MODELS 


Part 1 of this text introduces single-equation regression models. In these models, one variable, called the 
dependent variable, is expressed as a linear function of one or more other variables, called the explanatory 
variables. In such models it is assumed implicitly that causal relationships, if any, between the dependent 
and explanatory variables flow in one direction only, namely, from the explanatory variables to the dependent 
variable. 

In Chapter 1, we discuss the historical as well as the modern interpretation of the term regression and 
illustrate the difference between the two interpretations with several examples drawn from economics and 
other fields. 

In Chapter 2, we introduce some fundamental concepts of regression analysis with the aid of the two- 
variable linear regression model, a model in which the dependent variable is expressed as a linear function of 
only a single explanatory variable. 

In Chapter 3, we continue to deal with the two-variable model and introduce what is known as the clas- 
sical linear regression model, a model that makes several simplifying assumptions. With these assumptions, 
we introduce the method of ordinary least squares (OLS) to estimate the parameters of the two-variable 
regression model. The method of OLS is simple to apply, yet it has some very desirable statistical properties. 

In Chapter 4, we introduce the (two-variable) classical normal linear regression model, a model that 
assumes that the random dependent variable follows the normal probability distribution. With this assump- 
tion, the OLS estimators obtained in Chapter 3 possess some stronger statistical properties than the nonnor- 
mal classical linear regression model—properties that enable us to engage in statistical inference, namely, 
hypothesis testing. 

Chapter 5 is devoted to the topic of hypothesis testing. In this chapter, we try to find out whether the esti- 
mated regression coefficients are compatible with the hypothesized values of such coefficients, the hypoth- 
esized values being suggested by theory and/or prior empirical work. 
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Chapter 6 considers some extensions of the two-variable regression model. In particular, it discusses top- 
ics such as (1) regression through the origin, (2) scaling and units of measurement, and (3) functional forms 
of regression models such as double-log, semilog, and reciprocal models. 

In Chapter 7, we consider the multiple regression model, a model in which there is more than one explana- 
tory variable, and show how the method of OLS can be extended to estimate the parameters of such models. 

In Chapter 8, we extend the concepts introduced in Chapter 5 to the multiple regression model and point 
out some of the complications arising from the introduction of several explanatory variables. 

Chapter 9 on dummy, or qualitative, explanatory variables concludes Part 1 of the text. This chapter em- 
phasizes that not all explanatory variables need to be quantitative (i.e., ratio scale). Variables, such as gender, 
race, religion, nationality, and region of residence, cannot be readily quantified, yet they play a valuable role 
in explaining many an economic phenomenon. 


CHAPTER 


The Nature of 
Regression Analysis 


As mentioned in the Introduction, regression is a main tool of econometrics, and in this chapter we consider 
very briefly the nature of this tool. 


1.1 Historical Origin of the Term Regression 


The term regression was introduced by Francis Galton. In a famous paper, Galton found that, although there 
was a tendency for tall parents to have tall children and for short parents to have short children, the average 
height of children born of parents of a given height tended to move or “regress” toward the average height 
in the population as a whole.! In other words, the height of the children of unusually tall or unusually short 
parents tends to move toward the average height of the population. Galton’s law of universal regression was 
confirmed by his friend Karl Pearson, who collected more than a thousand records of heights of members of 
family groups.” He found that the average height of sons of a group of tall fathers was less than their fathers’ 
height and the average height of sons of a group of short fathers was greater than their fathers’ height, thus 
“regressing” tall and short sons alike toward the average height of all men. In the words of Galton, this was 
“regression to mediocrity.” 


1.2 The Modern Interpretation of Regression 


The modern interpretation of regression is, however, quite different. Broadly speaking, we may say 


Regression analysis is concerned with the study of the dependence of one variable, the dependent variable, on one 
or more other variables, the explanatory variables, with a view to estimating and/or predicting the (population) 
mean or average value of the former in terms of the known or fixed (in repeated sampling) values of the latter. 


‘Francis Galton, “Family Likeness in Stature,” Proceedings of Royal Society, London, vol. 40, 1886, pp. 42-72. 
2K. Pearson and A. Lee, “On the Laws of Inheritance,” Biometrika, vol. 2, Nov. 1903, pp. 357-462. 
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The full import of this view of regression analysis will become clearer as we progress, but a few simple 
examples will make the basic concept quite clear. 


Examples 


1. Reconsider Galton’s law of universal regression. Galton was interested in finding out why there was 
a stability in the distribution of heights in a population. But in the modern view our concern is not with this 
explanation but rather with finding out how the average height of sons changes, given the fathers’ height. In 
other words, our concern is with predicting the average height of sons knowing the height of their fathers. To 
see how this can be done, consider Figure 1.1, which is a scatter diagram, or scattergram. This figure shows 
the distribution of heights of sons in a hypothetical population corresponding to the given or fixed values of 
the father’s height. Notice that corresponding to any given height of a father is a range or distribution of the 
heights of the sons. However, notice that despite the variability of the height of sons for a given value of 
father’s height, the average height of sons generally increases as the height of the father increases. To show 
this clearly, the circled crosses in the figure indicate the average height of sons corresponding to a given height 
of the father, Connecting these averages, we obtain the line shown in the figure. This line, as we shall see, is 
known as the regression line. It shows how the average height of sons increases with the father’s height. 

2. Consider the scattergram in Figure 1.2, which gives the distribution in a hypothetical population of 
heights of boys measured at fixed ages. Corresponding to any given age, we have a range, or distribution, of 
heights. Obviously, not all boys of a given age are likely to have identical heights. But height on the average 
increases with age (of course, up to a certain age), which can be seen clearly if we draw a line (the regression 
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Figure 1.1 Hypothetical distribution of sons’ heights corresponding to given heights of fathers. 


3At this stage of the development of the subject matter, we shall call this regression line simply the line connecting the mean, 
or average, value of the dependent variable (son’s height) corresponding to the given value of the explanatory variable (father’s 


height). Note that this line has a positive slope but the slope is less than 1, which is in conformity with Galton’s regression 
to mediocrity. (Why?) 


The Nature of Regression Analysis 17 


70 {) Mean value 


a 
=] 


G 
e 
e 
e 
e 
. 
e 
. 
e 


Height, inches 
wn 
i=) 


40 


10 11 2 13 14 
Age, years 
Figure 1.2 Hypothetical distribution of heights corresponding to selected ages. 


line) through the circled points that represent the average height at the given ages. Thus, knowing the age, we 
may be able to predict from the regression line the average height corresponding to that age. 

3. Turning to economic examples, an economist may be interested in studying the dependence of personal 
consumption expenditure on aftertax or disposable real personal income. Such an analysis may be helpful in 
estimating the marginal propensity to consume (MPC), that is, average change in consumption expenditure 
for, say, a rupee’s worth of change in real income (see Figure 1.3). 

4. A monopolist who can fix the price or output (but not both) may want to find out the response of the 
demand for a product to changes in price. Such an experiment may enable the estimation of the price elasticity 
(i.e., price responsiveness) of the demand for the product and may help determine the most profitable price. 


Unemployment rate, % 


Rate of change of money wages 


Figure 1.3 Hypothetical Phillips curve. 
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5. A labor economist may want to study the rate of change of money wages in relation to the unemploy- 
ment rate. The historical data are shown in the scattergram given in Figure 1.3. The curve in Figure 1.3 is 
an example of the celebrated Phillips curve relating changes in the money wages to the unemployment rate. 
Such a scattergram may enable the labor economist to predict the average change in money wages given a 
certain unemployment rate. Such knowledge may be helpful in stating something about the inflationary pro- 
cess in an economy, for increases in money wages are likely to be reflected in increased prices. 

6. From monetary economics it is known that, other things remaining the same, the higher the rate of infla- 
tion 7r, the lower the proportion k of their income that people would want to hold in the form of money, as 
depicted in Figure 1.4. The slope of this line represents the change in k given a change in the inflation rate. A 
quantitative analysis of this relationship will enable the monetary economist to predict the amount of money, 
as a proportion of their income, that people would want to hold at various rates of inflation. 


A Money 
~ Income 


0 m 
Inflation rate 
Figure 1.4 Money holding in relation to the inflation rate a 

7. The marketing director of a company may want to know how the demand for the company’s product is 
related to, say, advertising expenditure. Such a study will be of considerable help in finding out the elasticity 
of demand with respect to advertising expenditure, that is, the percent change in demand in response to, say, 
a | percent change in the advertising budget. This knowledge may be helpful in determining the “optimum” 
advertising budget. 

8. Finally, an agronomist may be interested in studying the dependence of a particular crop yield, say, of 
wheat, on temperature, rainfall, amount of sunshine, and fertilizer. Such a dependence analysis may enable 
the prediction or forecasting of the average crop yield, given information about the explanatory variables. 

The reader can supply scores of such examples of the dependence of one variable on one or more other 
variables. The techniques of regression analysis discussed in this text are specially designed to study such 
dependence among variables. 
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1.3 Statistical versus Deterministic Relationships 


From the examples cited in Section 1.2, the reader will notice that in regression analysis we are concerned 
with what is known as the statistical, not functional or deterministic, dependence among variables, such 
as those of classical physics. In statistical relationships among variables we essentially deal with random 
or stochastic’ variables, that is, variables that have probability distributions. In functional or deterministic 
dependency, on the other hand, we also deal with variables, but these variables are not random or stochastic. 

The dependence of crop yield on temperature, rainfall, sunshine, and fertilizer, for example, is statistical 
in nature in the sense that the explanatory variables, although certainly important, will not enable the agrono- 
mist to predict crop yield exactly because of errors involved in measuring these variables as well as a host of 
other factors (variables) that collectively affect the yield but may be difficult to identify individually. Thus, 
there is bound to be some “intrinsic” or random variability in the dependent-variable crop yield that cannot 
be fully explained no matter how many explanatory variables we consider. 

In deterministic phenomena, on the other hand, we deal with relationships of the type, say, exhibited by 
Newton’s law of gravity, which states: Every particle in the universe attracts every other particle with a force 
directly proportional to the product of their masses and inversely proportional to the square of the distance 
between them. Symbolically, F = k(m,m,/r-). where F = force, m, and m, are the masses of the two particles, 
r = distance, and k = constant of proportionality. Another example is Ohm’s law, which states: For metal- 
lic conductors over a limited range of temperature the current C is proportional to the voltage V; that is, 
Geax GV where i is the constant of proportionality. Other examples of such deterministic relationships are 
Boyle’s gas law, Kirchhoff’s law of electricity, and Newton’s law of motion. 

In this text we are not concerned with such deterministic relationships. Of course, if there are errors of 
measurement, say, in the k of Newton’s law of gravity, the otherwise deterministic relationship becomes a 
statistical relationship. In this situation, force can be predicted only approximately from the given value of k 
(and m,, m», and r), which contains errors. The variable F in this case becomes a random variable. 


|.4 Regression versus Causation 


Although regression analysis deals with the dependence of one variable on other variables, it does not neces- 
sarily imply causation. In the words of Kendall and Stuart, “A statistical relationship, however strong and 
however suggestive, can never establish causal connection: our ideas of causation must come from outside 
statistics, ultimately from some theory or other” 

In the crop-yield example cited previously, there is no statistical reason to assume that rainfall does not 
depend on crop yield. The fact that we treat crop yield as dependent on rainfall (among other things) is due to 
nonstatistical considerations: Common sense suggests that the relationship cannot be reversed, for we cannot 
control rainfall by varying crop yield. 

In all the examples cited in Section 1.2 the point to note is that a statistical relationship in itself cannot 
logically imply causation. To ascribe causality, one must appeal to a priori or theoretical considerations. 
Thus, in the third example cited, one can invoke economic theory in saying that consumption expenditure 
depends on real income.° 


4The word stochastic comes from the Greek word stokhos meaning “a bull’s eye.” The outcome of throwing darts on a dart 
board is a stochastic process, that is, a process fraught with misses. 

5M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Charles Griffin Publishers, New York, vol. 2, 1961, chap. 26, 
p. 279. 

6But as we shall see in Chapter 3, classical regression analysis is based on the assumption that the model used in the analysis 
is the correct model. Therefore, the direction of causality may be implicit in the model postulated. 
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1.5 Regression versus Correlation 


Closely related to but conceptually very much different from regression analysis is correlation analysis, 
where the primary objective is to measure the strength or degree of linear association between two variables. 
The correlation coefficient, which we shall study in detail in Chapter 3, measures this strength of (linear) 
association. For example, we may be interested in finding the correlation (coefficient) between smoking and 
lung cancer, between scores on statistics and mathematics examinations, between high school grades and 
college grades, and so on. In regression analysis, as already noted, we are not primarily interested in such 
a measure. Instead, we try to estimate or predict the average value of one variable on the basis of the fixed 
values of other variables. Thus, we may want to know whether we can predict the average score on a statistics 
examination by knowing a student’s score on a mathematics examination. 

Regression and correlation have some fundamental differences that are worth mentioning. In regression 
analysis there is an asymmetry in the way the dependent and explanatory variables are treated. The dependent 
variable is assumed to be statistical, random, or stochastic, that is, to have a probability distribution. The ex- 
planatory variables, on the other hand, are assumed to have fixed values (in repeated sampling),’ which was 
made explicit in the definition of regression given in Section 1.2. Thus, in Figure 1.2 we assumed that the 
variable age was fixed at given levels and height measurements were obtained at these levels. In correlation 
analysis, on the other hand, we treat any (two) variables symmetrically; there is no distinction between the 
dependent and explanatory variables. After all, the correlation between scores on mathematics and statistics 
examinations is the same as that between scores on statistics and mathematics examinations. Moreover, 
both variables are assumed to be random. As we shall see, most of the correlation theory is based on the 
assumption of randomness of variables, whereas most of the regression theory to be expounded in this book 
is conditional upon the assumption that the dependent variable is stochastic but the explanatory variables are 
fixed or nonstochastic.® ` 


1.6 Terminology and Notation 


Before we proceed to a formal analysis of regression theory, let us dwell briefly on the matter of terminology 
and notation. In the literature the terms dependent variable and explanatory variable are described variously. 
A representative list given on next page: 

Although it is a matter of personal taste and tradition, in this text we will use the dependent variable/ex- 
planatory variable or the more neutral regressand and regressor terminology. 

If we are studying the dependence of a variable on only a single explanatory variable, such as that of con- 
sumption expenditure on real income, such a study is known as simple, or two-variable, regression analysis. 
However, if we are studying the dependence of one variable on more than one explanatory variable, as in the 
crop-yield, rainfall, temperature, sunshine, and fertilizer example, it is known as multiple regression analy- 
sis. In other words, in two-variable regression there is only one explanatory variable, whereas in multiple 
regression there is more than one explanatory variable. 

The term random is a synonym for the term stochastic. As noted earlier, a random or stochastic variable 
is a variable that can take on any set of values, positive or negative, with a given probability.” 


7 . . . . . . - 

it is crucial to note that the explanatory variables may be intrinsically stochastic, but for the purpose of regression analysis 
we assume that their values are fixed in repeated sampling (that is, X assumes the same values in various samples), thus 
rendering them in effect nonrandom or nonstochastic. But more on this in Chapter 3, Sec. 3.2. 


8 . . 
In advanced treatment of econometrics, one can relax the assumption that the explanatory variables are nonstochastic 
(see introduction to Part 2). 


?See Appendix A for formal definition and further details. 
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Controlled variable Control variable 


Unless stated otherwise, the letter Y will denote the dependent variable and the X’s (X,, X» ..., X) will 
denote the explanatory variables, X, being the kth explanatory variable. The subscript 7 or t will denote 
the ith or the tth observation or value. X,; (or X,,) will denote the ith (or tth) observation on variable X,. N 
(or T) will denote the total number of observations or values in the population, and n (or t) the total number 
of observations in a sample. As a matter of convention, the observation subscript i will be used for 
cross-sectional data (i.e., data collected at one point in time) and the subscript t will be used for time series 
data (i.e., data collected over a period of time). The nature of cross-sectional and time series data, as well as the 
important topic of the nature and sources of data for empirical analysis, is discussed in the following section. 


I.7 The Nature and Sources of Data for Economic Analysis!’ 


The success of any econometric analysis ultimately depends on the availability of the appropriate data. It is 
therefore essential that we spend some time discussing the nature, sources, and limitations of the data that one 
may encounter in empirical analysis. 


Types of Data 


Three types of data may be available for empirical analysis: time series, cross-section, and pooled (i.e., 
combination of time series and cross-section) data. 


Time Series Data 

The data shown in Table I.1 of the Introduction are an example of time series data. A time series is a set of 
observations on the values that a variable takes at different times. Such data may be collected at regular time 
intervals, such as daily (e.g., stock prices, weather reports), weekly (e.g., money supply figures), monthly 


10Eor an informative account, see Michael D. Intriligator, Econometric Models, Techniques, and Applications, Prentice Hall, 
Englewood Cliffs, N.J., 1978, chap. 3. 
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(e.g., the unemployment rate, the Consumer Price Index [CPI]), quarterly (e.g., GDP), annually (e.g., gov- 
ernment budgets), quinquennially, that is, every 5 years (e.g., the census of manufactures), or decennially, 
that is, every 10 years (e.g., the census of population). Sometime data are available both quarterly as well as 
annually, as in the case of the data on GDP and consumer expenditure. With the advent of high-speed comput- 
ers, data can now be collected over an extremely short interval of time, such as the data on stock prices, which 
can be obtained literally continuously (the so-called real-time quote). 

Although time series data are used heavily in econometric studies, they present special problems for 
econometricians. As we will show in chapters on time series econometrics later on, most empirical work 
based on time series data assumes that the underlying time series is stationary. Although it is too early to 
introduce the precise technical meaning of stationarity at this juncture, loosely speaking, a time series is 
Stationary if its mean and variance do not vary systematically over time. To see what this means, consider 
Figure 1.5, which depicts the behavior of the MI money supply in the United States from January 1, 1959, to 
September, 1999. (The actual data are given in Exercise 1.4.) As you can see from this figure, the M1 money 
supply shows a steady upward trend as well as variability over the years, suggesting that the M1 time series 
is not stationary.'! We will explore this topic fully in Chapter 21. 
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Figure 1.5 M1 money supply: United States, 1951:01-1999:09, 


Cross-Section Data 


Cross-section data are data on one or more variables collected at the same point in time, such as the Census 
of population conducted by the Government of India every 10 years, the survey of household consumer ex- 
penditure in India conducted by National Sample Survey Organization (NSSO), the opinion polls by Times 
of India, NDTV, CNN-IBN and umpteen other organizations. A concrete example of cross-sectional data is 
given in Table 1.1. This table gives data on labour productivity and wages for 27 states in India for 2007-08 


"To see this more clearly, we divided the data into four time periods: 1951:01 to 1962:1 2; 1963:01 to 1974:12; 1975:01 
to 1986:12, and 1 987:01 to 1999:09. For these subperiods the mean values of the money supply (with corresponding stan- 
dard deviations in parentheses) were, respectively, 165.88 (23.27), 323.20 (72.66), 788.12 (195.43), and 1099 (27.84), 


all figures in billions of dollars. This is a rough indication of the fact that the money supply over the entire period was not 
stationary. 
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and 2008-09. For each year the data on the 27 states are cross-sectional data. Thus, in Table 1.1 we have two 
cross-sectional samples. 

Just as time series data create their own special problems (because of the stationarity issue), cross-section- 
al data too have their own problems, specifically the problem of heterogeneity. From the data given in Table 
1.1 we see that we have some states that show high labour productivity (e.g., Goa, Gujarat) and some that in- 
dicate very low labour productivity (e.g., Manipur). When we include such heterogenous units in a statistical 
analysis, the size or scale effect must be taken into account so as not to mix apples with oranges. To see this 
clearly, we plot in Figure 1.6 the data on labour productivity and wages given to the workers in the 27 states 
of India for the year 2008-09. This figure shows how widely scattered the observations are. In Chapter 11 we 
will see how the scale effect can be an important factor in assessing relationships among economic variables. 
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Figure 1.6 Relationship between labour productivity and wages, 2008-09 


Table 1.1 Labour productivity in Indian Industries 


States Yı Yz x, Ve 
Andaman & N Island 42.00 51.02 189.00 235.00 
Andhra Pradesh 2117 23.35 422911.00 - 487449.00 
Assam 27.44 29.10 42729.00 54440.00 
Bihar 35.10 46.99 22891.00 29038.00 
Chandigarh 40.52 49.89 6073.00 5989.00 
Chattisgarh 49.02 60.21 86925.00 91071.00 
Dadra & Nagar Haveli 62.05 91.08 32779.00 41403.00 
Daman & Diu 45.05 43.91 29381.00 34970.00 
Delhi 30.29 31.38 50572.00 52452.00 
Goa 50.17 67.89 33052.00 36531.00 


(Contd.) 
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(Contd.) 
Gujarat E 56.27 O a 52a 00 ree 
Haryana SMS 38.25 253746.00 273749.00 
Himachal Pradesh 45.78 50.04 39593.00 51202.00 
Jammu & Kashmir 38.44 29.96 19402.00 21500.00 
Jharkhand SES 48.24 181976.00 171027.00 
Karnataka 32.45 37.76 383165.00 . 425350.00 
Kerala 18.00 21.42 133235.00 157616.00 
Madhya Pradesh 40.57 44.66 123661.00 144693.00 
Maharashtra 54.55 58.03 846112.00 972674.00 
Manipur 2339 2.90 569.00 609.00 
Meghalaya . 37.49 42.04 2950.00 3144.00 
Nagaland 5.46 5x9 501.00 483.00 
Orissa 33.05 39.78 129957.00 173653.00 
Puducherry 39.58 36.99 24060.00 25369.00 
Punjab 22.09 24.45 216337.00 231594.00 
Rajasthan 27-51 32.86 138032.00 160873.00 
Tamil Nadu 20.68 20.66 644041.00 850633.00 
Tripura 3.70 3.20 3579.00 4474.00 
Uttar Pradesh 32.87 34.87 324673.00 359075.00 
Uttarakhand 33.85 47.97 70785.00 179307.00 
West Bengal 25.98 352 310289.00 343201.00 


Note: Y, = labour productivity in 2007-08, in Rs. Lakhs per worker (calculated as ratio of value of gross output to number of workers) 
Y, = labour productivity in 2008-09, in Rs. Lakhs per worker 
X, = wages given to workers in 2007-08, in Rs. Lakhs 
X, = wages given to workers in 2008-09, in Rs. Lakhs 

Source: Annual Survey of Industries, 2007-08, 2008-09, Central Statistical Organization, Government of India. 


Pooled Data 


In pooled, or combined, data are elements of both time series and cross-section data. The data in Table 1.1 
are an example of pooled data. For each year we have 27 cross-sectional observations and fér each state we 
have two time series observations on labour productivity and wages given to workers, a total of 54 pooled (or 
combined) observations. Likewise, the data given in Exercise 1.1 are pooled data in that the Consumer Price 
Index (CPI) for each country for 1980-2005 is time series data, whereas the data on the CPI for the seven 
countries for a single year are cross-sectional data. In the pooled data we have 182 observations—26 annual 
observations for each of the seven countries. 


Panel, Longitudinal, or Micropanel Data 


This is a special type of pooled data in which the same cross-sectional unit (say, a family or a firm) is surveyed 
over time. For example, the U.S. Department of Commerce carries out a census of housing at periodic inter- 
vals. At each periodic survey the same household (or the people living at the same address) is interviewed to 
find out if there has been any change in the housing and financial conditions of that household since the last 
survey. By interviewing the same household periodically, the panel data provide very useful information on 
the dynamics of household behavior, as we shall see in Chapter 16. 
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As a concrete example, consider the data given in Table 1.2. The data in the table, originally collected by Y. 
Grunfeld, refer to the real investment, the real value of the firm, and the real capital stock of four U.S. compa- 
nies, namely, General as (GM), U.S. Steel (US), General Motors (GM), and Westinghouse (WEST), for 
the period 1935-1954.'* Since the data are for several companies collected over a number of years, this is a 
classic example of panel data. In this table, the number of observations for each company is the same, but this 
is not always the case. If all the companies have the same number of observations, we have what is called a 
balanced panel. If the number of observations is not the same for each company, it is called an unbalanced 
panel. In Chapter 16, Pane] Data Regression Models, we will examine such data and show how to estimate 
such models. 

Grunteld’s purpose in collecting these data was to find out how real gross investment (/) depends on the 
real value of the firm (F) a year earlier and real capital stock (C) a year earlier. Since the companies included 
in the sample operate in the same capital market, by studying them together, Grunfeld wanted to find out if 
they had similar investment functions. 


The Sources of Data!? 


The data used in empirical analysis may be collected by a governmental agency (e.g., the Central Statistical 
Organization), an international agency (e.g., the International Monetary Fund [IMF] or the World Bank), a 
private organization (e.g., the Centre for Monitoring Indian Economy), or an individual. Literally, there are 
thousands of such agencies collecting data for one purpose or another. 


The Internet 


The Internet has literally revolutionized data gathering. If you just “surf the net” with a keyword (e.g., ex- 
change rates), you will be swamped with all kinds of data sources. In Appendix E we provide some of the 
frequently visited websites that provide economic and financial data of all sorts. Most of the data can be 
downloaded without much cost. You may want to bookmark the various websites that might provide you with 
useful economic data. 

The data collected by various agencies may be experimental or nonexperimental. In experimental data, 
often collected in the natural sciences, the investigator may want to collect data while holding certain factors 
constant in order to assess the impact of some factors on a given phenomenon. For instance, in assessing the 
impact of obesity on blood pressure, the researcher would want to collect data while holding constant the 
eating, smoking, and drinking habits of the people in order to minimize the influence of these variables on 
blood pressure. 

In the social sciences, the data that one generally encounters are nonexperimental in nature, that is, not 
subject to the control of the researcher.!* For example, the data on GNP, unemployment, stock prices, etc., 
are not directly under the control of the investigator. As we shall see, this lack of control often creates special 
problems for the researcher in pinning down the exact cause or causes affecting a particular situation. For 
example, is it the money supply that determines the (nominal) GDP or is it the other way around? 


12Y. Grunfeld, “The Determinants of Corporate Investment,” unpublished PhD thesis, Department of Economics, University 
of Chicago, 1958. These data have become a workhorse for illustrating panel data regression models. 

'3For an illuminating account, see Albert T. Somers, The U.S. Economy Demystified: What the Major Economic Statistics Mean 
and their Significance for Business, D.C. Heath, Lexington, Mass., 1985. 

141n the social sciences too sometimes one can have a controlled experiment. An example is given in Exercise 1.6. 
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Table 1.2 Investment Data for Four Co 


mpanies, 1935-1 954 


Ezi 


Observation j F1 Cy Observation / Cı 
GE US 
1935 33.1 1170.6 - 97.8 1935 209.9 1362.4 53.8 
1936 45.0 2015.8 104.4 1936 355.3 1807.1 50.5 
1937 (hi? 2803.3 118.0 1937 469.9 2673.3 118.1 
1938 44.6 2039.7 156.2 1938 262.3 1801.9 260.2 
1939 48.1 2256.2 172.6 1939 230.4 1957.3 3127 
1940 74.4 21322 186.6 1940 361.6 2202.9 254.2 
1941 113.0 1834.1 ` 220.9 1941 472.8 2380.5 261.4 
1942 91.9 1588.0 287.8 1942 445.6 2168.6 298.7 
1943 61.3 1749.4 319.9 1943 361.6 1985.1 301.8 
1944 56.8 1687.2 321.3 1944 288.2 1813.9 279.1 
1945 93.6 2007.7 319.6 1945 258.7 1850.2 213.8 
1946 159.9 2208.3 346.0 1946 420.3 2067.7 232.6 
1947 147.2 1656.7 456.4 1947 420.5 1796.7 264.8 
1948 146.3 1604.4 543.4 1948 494.5 1625.8 306.9 
1949 98.3 1431.8 618.3 1949 405.1 1667.0 351.1 
1950 93.5 1610.5 647.4 1950 418.8 1677.4 357.8 
1951 135.2 1819.4 671.3 1951 588.2 2289.5 341.1 
1952 157.3 2079.7 726.1 1952 645.2 2159.4 444.2 
1953 179.5 2371.6 800.3 1953 641.0 2031.3 623.6 
1954 189.6 2759.9 888.9 1954 459.3 2115.5 669.7 
GM WEST 
1935 317.6 3078.5 2.8 1935 12.93 nies 1.8 
1936 391.8 4661.7 5216 1936 25.90 516.0 0.8 
1937 410.6 5387.1 156.9 1937 35.05 729.0 7.4 
1938 257.7 2792.2 209.2 1938 22.89 560.4 18.1 
1939 330.8 4313.2 203.4 1939 18.84 519.9 23.5 
1940 461.2 4643.9 207.2 1940 28.57 628.5 26.5 
1941 512.0 4551.2 255.2 1941 48.51 §37.1 36.2 
1942 448.0 3244.1 303.7 1942 43.34 561.2 60.8 
1943 499.6 4053.7 264.1 1943 37.02 617.2 84.4 
1944 547.5 4379.3 201.6 1944 37.81 626.7 91.2 
1945 561.2 4840.9 265.0 1945 39.27 737.2 92.4 
1946 688.1 4900.0 402.2 1946 53.46 760.5 86.0 
1947 568.9 3526.5 761.5 1947 55.56 581.4 111.1 
1948 529.2 3245.7 922.4 1948 49.56 662.3 130.6 
1949 555.1 3700.2 1020.1 1949 32.04 583.8 141.8 
1950 642.9 3755.6 1099.0 1950 32.24 635.2 136.7 
1951 755.9 4833.0 1207.7 1951 - 54.38 732.8 129.7 
1952 891.2 4924.9 1430.5 1952 71.78 864.1 145.5 
1953 1304.4 6241.7 1777.3 1953 90.08 1193.5 174.8 
1954 1486.7 5593.6 2226.3 1954 68.60 1188.9 2135 


Notes: Y = I = gross investment = additions to plant and equipment plus maintenance and repairs, in millions of dollars deflated by P;. 


X = F = value of the firm = price of common and preferred shares at Dec. 31 (or average price of Dec. 31 and Jan. 31 of the following year) times 
number of common and preferred shares outstanding plus total book value of debt at Dec. 31, in millions of dollars deflated by P2. 

X; = C = stock of plant and equipment = accumulated sum of net additions to plant and equipment deflated by P, minus depreciation allowance 
deflated by P3 in these definitions. 
Pı = implicit price deflator of producers’ durable equipment (1947 = 100). 
Pa = implicit price deflator of GNP (1947 = 100). 


P, = depreciation expense deflator = 10-year moving average of wholesale price index of metals and metal products (1947 = 100). 
Source: Reproduced from H. D. Vinod and Aman Ullah, Recent Advances in Regression Methods, Marcel Dekker, New York, 1981, pp. 259-261. 
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The Accuracy of Data!’ 


Although plenty of data are available for economic research, the quality of the data is often not that good. 
There are several reasons for that. 


1. As noted, most social science data are nonexperimental in nature. Therefore, there is the possibility of 
observational errors, either of omission or commission. 

2. Even in experimentally collected data, errors of measurement arise from approximations and round- 
offs. 

3. In questionnaire-type surveys, the problem of nonresponse can be serious; a researcher is lucky to get 
a 40 percent response rate to a questionnaire. Analysis based on such a partial response rate may not 
truly reflect the behavior of the 60 percent who did not respond, thereby leading to what is known as 
(sample) selectivity bias. Then there is the further problem that those who do respond to the question- 
naire may not answer all the questions, especially questions of a financially sensitive nature, thus lead- 
ing to additional selectivity bias. 

4. The sampling methods used in obtaining the data may vary so widely that it is often difficult to compare 
the results obtained from the various samples. 

5. Economic data are generally available at a highly aggregate level. For example, most macrodata (e.g., 
GNP, employment, inflation, unemployment) are available for the economy as a whole or at the most 
for some broad geographical regions. Such highly aggregated data may not tell us much about the in- 
dividuals or microunits that may be the ultimate object of study. 

6. Because of confidentiality, certain data can be published only in highly aggregate form. The IRS in the 
US, for example, is not allowed by law to disclose data on individual tax returns; it can only release 
some broad summary data. Therefore, if one wants to find out how much individuals with a certain level 
of income spent on health care, one cannot do so except at a very highly aggregate level. Such macro- 
analysis often fails to reveal the dynamics of the behavior of the microunits. Similarly, the Department 
of Commerce, which conducts the census of business every 5 years, is not allowed to disclose informa- 
tion on production, employment, energy consumption, research and development expenditure, etc., at 
the firm level. It is therefore difficult to study the interfirm differences on these items. 


Because of all of these and many other problems, the researcher should always keep in mind that the 
results of research are only as good as the quality of the data. Therefore, if in given situations research- 
ers find that the results of the research are “unsatisfactory,” the cause may be not that they used the wrong 
model but that the quality of the data was poor. Unfortunately, because of the nonexperimental nature of the 
data used in most social science studies, researchers very often have no choice but to depend on the available 
data. But they should always keep in mind that the data used may not be the best and should try not to be too 
dogmatic about the results obtained from a given study, especially when the quality of the data is suspect. 


A Note on the Measurement Scales of Variables!'® 


The variables that we will generally encounter fall into four broad categories: ratio scale, interval scale, or- 
dinal scale, and nominal scale. It is important that we understand each. 


15For a critical review, see O. Morgenstern, The Accuracy of Economic Observations, 2d ed., Princeton University Press, 


Princeton, N.J., 1963. 
'6The following discussion relies heavily on Aris Spanos, Probability Theory and Statistical Inference: Econometric Modeling with 


Observational Data, Cambridge University Press, New York, 1999, p. 24. 
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Ratio Scale 


For a variable X, taking two values, X; and X,, the ratio X,/X, and the distance (X, — X;) are meaningful 
quantities. Also, there is a natural ordering (ascending or descending) of the values along the scale. Therefore, 
comparisons such as X, = X, or X, = X, are meaningful. Most economic variables belong to this category. 
Thus, it is meaningful to ask how big this year’s GDP is compared with the previous year’s GDP. Personal 
income, measured in rupees, is a ratio variable; someone earning Rs. 100,000 is making twice as much as 
another person earning Rs. 50,000. 


Interval Scale 


An interval scale variable satisfies the last two properties of the ratio scale variable but not the first. Thus, 
the distance between two time periods, say (2000-1995) is meaningful, but not the ratio of two time periods 
(2000/1995). At 3.00 p.m. GMT on February 12, 2011, Srinagar reported a temperature of 5 degree Celsius, 
while Gurgaon, New Delhi reached 20 degrees Celsius. Temperature is not measured on a ratio scale since it 
does not make sense to claim that New Delhi was 300 percent warmer than Srinagar. This is mainly due to 
the fact that the Celsius scale does not use 0 degrees as a natural base. 


Ordinal Scale 


A variable belongs to this category only if it satisfies the third property of the ratio scale (i.e., natural order- 
ing). Examples are grading systems (A, B, C grades) or income class (upper, middle, lower). For these vari- 
ables the ordering exists but the distances between the categories cannot be quantified. Students of economics 
will recall the indifference curves between two goods. Each higher indifference curve indicates a higher level 
of utility, but one cannot quantify by how much one indifference curve is higher than the others. 


Nominal Scale 


Variables in this category have none of the features of the ratio scale variables. Variables such as gender 
(male, female) and marital status (married, unmarried, divorced, separated) simply denote categories. Ques- 
tion: What is the reason why such variables cannot be expressed on the ratio, interval, or ordinal scales? 

As we shail see, econometric techniques that may be suitable for ratio scale variables may not be suitable 
for nominal scale variables. Therefore, it is important to bear in mind the distinctions among the four types 
of measurement scales discussed above. 


Summary and Conclusions 


1. The key idea behind regression analysis is the statistical dependence of one variable, the dependent 
variable, on one or more other variables, the explanatory variables. 

2. The objective of such analysis is to estimate and/or predict the mean or average value of the dependent 
variable on the basis of the known or fixed values of the explanatory variables. 

3. In practice the success of regression analysis depends on the availability of the appropriate data. This 
chapter discussed the nature, sources, and limitations of the data that are generally available for re- 
search, especially in the social sciences. 

4. In any research, the researcher should clearly state the sources of the data used in the analysis. their 
definitions, their methods of collection, and any gaps or omissions in the data as well as any revisions 
in the data. Keep in mind that the macroeconomic data published by the government are often revised. 
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. Since the reader may not have the time, energy, or resources to track down the data, the reader has the 
right to presume that the data used by the researcher have been properly gathered and that the computa- 
tions and analysis are correct. 


Multiple Choice Questions 


. Ina regression analysis the values are fixed for the 

a. Explanatory variables 

b. Dependent variables 

c. All variables 

d. None of the variables 

. Regression analysis is concerned with the study of the dependence of 
a. Explanatory variables on one or more dependent variables 
b. Dependent variable on one or more explanatory variables 
c. Both explanatory and dependent variables on other known variables 
d. Two known variables 

. Stochastic variables are 

a. Deterministic values 

b. Non-random variables 

c. Imply causation 

d. Have probability distribution 

. Newton’s law of gravity is an example for 

a. Stochastic relationship 

b. Statistical relationship 

c. Deterministic phenomena 

d. Comparing economics to science 

. Regression analysis 

a. Necessarily imply causation 

b. Does not necessarily imply causation 

c. Always analyses the cause-effect scenario 

d. Imply correlation effects 

. A Statistical relationship in itself 

a. Can help establish causation 

b. Can help establish direction of causation 

c. Cannot logically imply causation 

d. Always shows correlation 

. In correlation analysis we measure the 

a. Degree of linear association between two variables 

b. Degree of causation between two variables 

c. Predictability of the two variables 

d. Regression between the two variables 

. The dependent variable in regression analysis is assumed to be 
a. Non stochastic 

b. Constant 
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c. Stochastic 
d. Known values 
9. The explanatory variables in regression analysis are assumed to be 
a. Non stochastic 
b. Constant 
c. Stochastic 
d. Known values 
10. In correlation analysis the dependent and explanatory variables 
a. Are treated with distinction 
b. Are treated symmetrically 
c. Are treated differently based on individual case 
d. Are regressed 
11. Studying the dependence of a variable on only a single explanatory variable is known as 
a. One-variable regression analysis 
b. Two-variable regression analysis 
c. Three variable regression analysis 
d. Multiple regression analysis 
12. Data collected at a point in time is called 
a. Cross-sectional data 
b. Time series data 
c. Pooled data 
d. Panel data 
13. Data collected for a variable over a period of time is called 
a. Cross-sectional data 
b. Time series data 
c. Pooled data 
d. Panel data 
14. To study the performance of various states across India, the data on state domestic product is collected 
for all states for the period 1990 to 2010. Such a data set represents 
a. Cross-sectional data 
b. Time series data 
c. Pooled data 
d. Panel data 
15. Population census data is an example of 
a. Cross-sectional data 
b. Time series data 
c. Pooled data 
d. Panel data : 
16. Data collected for the same set of companies over 10 years is an example of 
a. Cross-sectional data 
b. Time series data 
c. Pooled data 
d. Panel data 
17. Firm data collected for top 10 companies classified based on profitability for 10 years is an example of 
a. Cross-sectional data 
b. Time series data 


18. 


Mes 


20. 


21; 


22: 


23: 


24. 
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c. Pooled data 
d. Panel data 
The data on GDP, unemployment, household expenditure are examples of 
a. Experimental data 
b. Non-experimental data 
c. Cross-section data 
d. Time series data 
Variables such as grades in mathematics, results of horse race, degree of satisfaction at a restaurant are 
examples of 
a. Ratio scale 
b. Interval scale 
c. Ordinal scale 
d. Nominal scale 
Variables such as income of an individual, age, exports for a country are all examples of 
a. Ratio scale 
b. Interval scale 
c. Ordinal scale 
d. Nominal scale 
Variables such as temperature, dates are examples of 
a. Ratio scale 
b. Interval scale 
c. Ordinal scale 
d. Nominal scale 
Variables such as gender, marital status, colour of the eye are examples of 
a. Ratio scale 
b. Interval scale 
c. Ordinal scale 
d. Nominal scale 
Based on the adjacent graph below select the statement 
that is true. 
a. As price increases the average demand decreases 
b. For every increase in price there is only one possible 
demand level which is higher than before 
c. As demand increases price on average decreases 
d. As demand increases price on average increases 
In India, during the Diwali season, the demand for sweets Price 
is very high. Prices of sweets depend on the level of de- 
mand for the particular brand of sweet. If we try to regress 
the price of sweets on the demand for sweets during this 
season we can expect to get a positive relation as depicted 
in adjacent figure. This can be read as 
a. Demand for sweets increases as average price of 
sweets also increases Py 
b. At higher demand level for branded sweets, average 
price for sweets is also high. Thus the regression re- 
flects the consumer preference for high priced sweets. 


Demand 


Price 


Demand 
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c. If prices decrease, demand also will decrease. Hence sweet shops should compete to price their 
products higher than others. 
d. Price and demand for sweets move together 
25. Figure given in Question 24 P, represents 
a. The minimum price for sweets demanded 
b. The reserve price for sweets below which shop keepers will not sell them 
c. It’s the absolute price of sweets set based on season 
d. None of the above 


Exercises 


1.1. Table 1.3 gives data on the Consumer Price Index (CPI) for seven industrialized countries with 1982— 
1984 = 100 as the base of the index. Table 1.4 gives data on wholesale price index (WPI) for India, a 
measure of inflation used in India. 

a. From the given data, compute the inflation rate for each country.” 

b. Plot the inflation rate for each country against time (i.e., use the horizontal axis for time and the 
vertical axis for the inflation rate). 

c. What broad conclusions can you draw about the inflation experience in the eight countries? 

d. Which country’s inflation rate seems to be most variable? Can you offer any explanation? 

1.2. a. Using Table 1.3 and Table 1.4, plot the inflation rate of Canada, France, Germany, Italy, Japan, 

United Kingdom and the United States against inflation rate of India. 

b. Comment generally about the behavior of the inflation rate in the seven countries vis-a-vis the 
Indian inflation rate. 

c. If you find that the seven countries’ inflation rates move in the same direction as the India’s infla- 
tion rate, would that suggest that Indian inflation “causes” inflation in the other countries? Why or 
why not? 


Table 1.3 CPI in Seven Industrial Countries, 1980-2005 (1982-1984 = 100) 


Year 


U.S. Canada Japan France Germany Italy | UK. 
1980 82.4 76.1 91.0 7242. 86.7 639 785 
1981 90.9 85.6 95.3 81.8 9272 75.5 87.9 
1982 96.5 94.9 98.1 91.7 97.0 87.8 95.4 
1983 99.6 100.4 99.8 100.3 100.3 100.8 99.8 
1984 103.9 104.7 102.1 108.0 102.7 111.4 104.8 
1985 107.6 109.0 104.2 114.3 104.8 127 Tam 
1986 109.6 113;5 104.9 Wize 104.6 . 128.9 114.9 
1987 113.6 118.4 104.9 121.1 104.9 Vase "i97 
1988 118.3 123.2 105.6 124.3 106.3 141.9 . 125.6 
1989 124.0 129.3 108.0 128.7 109.2 150.7 135.4 
1990 130.7 1355 111.4 1329 1122 160.4 148.2 
(Contd.) 


1 Subtract from the current year’s CPI the CPI from the previous year, divide the difference by the previous year’s CPI, and 
multiply the result by 100. Thus, the inflation rate for Canada for 1981 is [(85.6 - 76.1)/76.1} x 100 = 12.48% (approx.). 
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(Contd.) 

1981 136.2 143.1 115.0 137:2 116.3 170.5 156.9 
1992 140.3 145.3 117.0 140.4 i222 17935 162.7 
12923 144.5 147.9 118.5 143.4 127.6 187.7 165.3 
1994 148.2 148.2 1323 145.8 131.1 1953 169.3 
1995 152.4 151.4 119-2 148.4 133.3 205.6 1732 
1996 156.9 153.8 119.3 151.4 135.3 213.8 179.4 
1997 160.5 156.3 121.5 153:2 137.8 2182 185.1 
1998 163.0 157.8 122.2 154.2 139.1 2225 191.4 
1999 166.6 160.5 121.8 155.0 140.0 226.2 194.3 
2000 172.2 164.9 121.0 157.6 142.0 231.9 200.1 
2001 177.1 169.1 120.1 160.2 144.8 238.3 203.6 
2002 179.9 172.9 119.0 163.3 146.7 244.3 207.0 
2003 184.0 17A 118.7 166.7 148.3 250.8 213.0 
2004 188.9 181.0 118.7 170.3 150.8 | 256.3 219.4 
2005 195-3 184.9 118.3 1732 153.7 261.3 225.6 


Source: Economic Report of the President, 2007, Table 108, p. 354. 


Table 1.4 Wholesale price index for India (1981-82 = 100) 


Year WPI l Year WPI 

1980 88.20 1993 242.10 
1981 99.00 1994 275.60 
1982 101.40 1995 297.90 
1983 ? 109.70 1996 311.20 
1984 118.70 1997 325.40 
1985 125.60 1998 344.20 
1986 132.30 1999 356.30 
1987 140.70 2000 378.60 
1988 152.40 2001 398.20 
1989 162.50 2002 408.10 
1990 177.20 2003 429.70 
1991 201.40 l 2004 458.20 
1992 224.70 2005 480.00 


n e a a M o eee 
Source: Handbook of Industrial Policy and Statistics, 2007-08, Office of Economic Adviser, GOI. 


1.3 Table 1.5 gives the foreign exchange rates for four countries for the years 1985 to 2007. Except for 
Japan, the exchange rate is defined as rupees per unit of foreign currency; for Japan, it is defined as 
rupees per 100 yen. 

a. Plot these exchange rates against time and comment on the general behavior of the exchange rates 
over the given time period 

b. The rupee is said to appreciate if it can buy more units of a foreign currency. Contrarily, it is said 
to depreciate if it buys fewer units of a foreign currency. Over the time period 1985-2007, what 
has been the general behavior of the rupee? Incidentally, look up any textbook on macroeconomics 
or international economics to find out what factors determine the appreciation or depreciation of a 
currency. 
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Table 1.5 Exchange Rates for four countries: 1985-2007 


ee ee O O l 


Year US United Kingdom | Germany Japan 
1985 12.3640 15.9904 -~ 4.2282 1522 
1986 12.6053 18.4924 5.8414 7.54 
1987 12.9552 21.2366 7.2207 8.98 
1988 = 13.9147 24.7729 7.9297 10.87 
1989 16.2238 26.5515 8.6438 11.76 
1990 17.4992 31.2835 10.8694 12.16 
1991 22.6890 39.9941 - - 137699; 16.92 
1992 25.9206 45.7104 . 16.6354 20.48 
1993 31.4439 47.216 ; 19.0264 28.36 
1994 31.3742 48.0482 19.4345 ` 30.737 
1995 32.4198 51.1662 : 22.6515 34.6113 
1996 35.4280 55.3422 23.5694 32.5971 
1997 36.3195 59.5346 20.9861 30.0495 
1998 41.2665 68.3525 23.5057 31.668 
i999 43.0552 69.67 ; 45.9561 37.9983 
2000 44.9401 - 68.076 41.4939 _ 41.7258 
2001 47.1857 67.9826 42.2869 38.8674 
2002 48.5993 73.0028 45.9261 38.8722 
2003 46.5818 76.0974 52.6603 . 40.2047 
2004 45.3165 82.9983 56.3259 ` 41.8941 
2005 44.1000 80.253 54.8993 40.102 
2006 45.3325 83.6546 57.0138 wee 39.0195 
2007 41.2926 _ 82.6563 56.5623 35.085 


Source: Handbook of Statistics on the Indian Economy, 2007-08, Reserve Bank of India, Mumbai 


1.4. The data behind the M1 money supply in Figure 1.5 are given in Table 1.6. Can you give reasons why 
the money supply has been increasing over the time period shown in the table? 


Table 1.6 Seasonally Adjusted M1 Supply: 1959:01—1999:07 (billions of dollars) 


1959:01 138.8900 139.3900 139.7400 139.6900 140.6800 141.1700 
1959:07 141.7000 141.9000 141.0100 140.4700 140.3800 139.9500 
1960:01 139.9800 139.8700 139.7500 139.5600 139.6100 139.5800 
1960:07 140.1800 141.3100 141.1800 140.9200 140.8600 . 140.6900 
1961:01 ‘141.0600 141.6000 141.8700 142.1300 142.6600 142.8800 
1961:07 142.9200 143.4900 143.7800 144.1400 144.7600 145.2000 
1962:01 145.2400 145.6600 145.9600 146.4000 146.8400 146.5800 
1962:07 146.4600 146.5700 146.3000 146.7100 147.2900 147.8200 
1963:01 148.2600 148.9000 149.1700 149.7000 ` 150.3900 150.4300 
1963:07 151.3400 151.7800 151.9800 152.5500 153.6500 153.2900 
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1964:01 153.7400 154.3100 154.4800 154.7700 155.3300 155.6200 
1964:07 156.8000 157.8200 158.7500 159.2400 159.9600 160.3000 
1965:01 160.7100 160.9400 161.4700 162.0300 161.7000 162.1900 
1965:07 163.0500 163.6800 164.8500 165.9700 166.7100 167.8500 
1966:01 169.0800 169.6200 170.5100 171.8100 171.3300 171.5700 
1966:07 170.3100 170.8100 171.9700 171.1600 171.3800 172.0300 
1967:01 171.8600 172.9900 174.8100 174.1700 175.6800 177.0200 
1967:07 178.1300 179.7100 180.6800 181.6400 182.3800 183.2600 
1968:01 184.3300 184.7100 185.4700 186.6000 187.9900 189.4200 
1968:07 190.4900 191.8400 192.7400 194.0200 196.0200 197.4100 
1969:01 198.6900 199.3500 200.0200 200.7100 200.8100 201.2700 
1969:07 201.6600 201.7300 202.1000 202.9000 203.5700 203.8800 
1970:01 206.2200 205.0000 205.7500 206.7200 207.2200 207.5400 
1970:07 207.9800 209.9300 211.8000 212.8800 213.6600 214.4100 
1971:01 215.5400 = 217.4200 218.7700 220.0000 222.0200 223.4500 
1971:07 224.8500 = 225.5800 226.4700 227.1600 227.7600 228.3200 
1972:01 230.0900 232.3200 234.3000 235.5800 235.8900 236.6200 
1972:07 238.7900 240.9300 243.1800 245.0200 246.4100 249.2500 
1973:01 251.4700 = 252.1500 251.6700 252.7400 254.8900 256.6900 
1973:07 ` 257.5400 257.7600 257.8600 259.0400 260.9800 262.8800 
1974:01 263.7600 265.3100 266.6800 267.2000 267.5600 268.4400 
1974:07 ` 269.2700 270.1200 271.0500 272.3500 273.7100 274.2000 
1975:01 273.9000 275.0000 276.4200 276.1700 279.2000 282.4300 
1975:07 283.6800 284.1500 285.6900 285.3900 286.8300 287.0700 
1976:01 288.4200 290.7600 292.7000 294.6600 295.9300 296.1600 
1976:07 297.2000 299.0500 299.6700 302.0400 303.5900 306.2500 
1977:01 308.2600 311.5400 313.9400 316.0200 317.1900 318.7100 
1977:07 320.1900 322.2700 324.4800 326.4000 328.6400 330.8700 
1978:01 334.4000 335.3000 336.9600 339.9200 344.8600 346.8000 
1978:07 347.6300 349.6600 352.2600 353.3500 355.4100 357.2800 
1979:01 358.6000 359.9100 362.4500 368.0500 369.5900 373.3400 
1979:07 377.2100 378.8200 379.2800 380.8700 380.8100 381.7700 
1980:01 385.8500 389.7000 388.1300 383.4400 384.6000 389.4600 
1980:07 394.9100 400.0600 405.3600 409.0600 410.3700 408.0600 
1981:01 410.8300 414.3800 418.6900 427.0600 424.4300 425.5000 
1981:07 427.9000 427.8500 427.4600 428.4500 430.8800 436.1700 
1982:01 442.1300 441.4900 442.3700 446.7800 446.5300 447.8900 
1982:07 449.0900 452.4900 457.5000 464.5700 471.1200 474.3000 
1983:01 476.6800 483.8500 490.1800 492.7700 499.7800 504.3500 
1983:07 508.9600 511.6000 513.4100 517.2100 518.5300 520.7900 
1984:01 524.4000 526.9900 530.7800 534.0300 536.5900 540.5400 
1984:07 542.1300 542.3900 543.8600 543.8700 547.3200 551.1900 
1985:01 555.6600 562.4800 565.7400 569.5500 575.0700 583.1700 
1985:07 590.8200 598.0600 604.4700 607.9100 611.8300 619.3600 
1986:01 620.4000 624.1400 632.8100 640.3500 652.0100 661.5200 
1986:07 672.2000 680.7700 688.5100 695.2600 705.2400 724.2800 
1987:01 729.3400 729.8400 733.0100 743.3900 746.0000 743.7200 
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a ee See 
1987:07 744.9600 746.9600 748.6600 756.5000 752.8300 749.6800 


1988:01 755.5500 757.0700 761.1800 767.5700 771.6800 779.1000 
1988:07 783.4000 785.0800 784.8200 783.6300 784.4600 786.2600 
1989:01 784.9200 783.4000 782.7400 778.8200 774.7900 774.2200 
1989:07 779.7100 781.1400 782.2000 787.0500 787.9500 792.5700 
1990:01 794.9300 797.6500 801.2500 806.2400 804.3600 810.3300 
1990:07 811.8000 817.8500 821.8300 820.3000 822.0600 824.5600 
1991:01 826.7300 832.4000 838.6200 842.7300 848.9600 858.3300 
1991:07 862.9500 868.6500 871.5600 878.4000 887.9500 896.7000 
1992:01 910.4900 925.1300 936.0000 943.8900 950.7800 954.7100 
1992:07 964.6000 975.7100 988.8400 1004.340 1016.040 1024.450 
1993:01 1030.900 1033.150 1037.990 1047.470 1066.220 1075.610 
1993:07 1085.880 1095.560 1105.430 1113.800 1123.900 1129.310 
1994:01 1132.200 1136.130 1139.910 1141.420 1142.850 1145.650 
1994:07 1151.490 1151.390 1152.440 1150.410 1150.440 1149.750 
1995:01 1150.640 1146.740 1146.520 1149.480 1144.650 1144.240 
1995:07 1146.500 1146.100 1142.270 1136.430 1133.550 1126.730 
1996:01 1122.580 1117.530 1122.590 1124.520 1116.300 1115.470 
1996:07 1112.340 1102.180 1095.610 1082.560 1080.490 1081.340 
1997:01 1080.520 1076.200 1072.420 1067.450 1063.370 1065.990 
1997:07 1067.570 1072.080 1064.820 1062.060 1067.530 1074.870 
1998:01 1073.810 1076.020 1080.650 1082.090 1078.170 1077.780 
1998:07 1075.370 1072.210 1074.650 1080.400 1088.960 1093.350 
1999:01 1091.000 1092.650 1102.010 1108.400 1104.750 1101.110 
1999:07 1099.530 1102.400 1093.460 


Source: Board of Governors, Federal Reserve Bank, USA. 


Suppose you were to develop an economic model of criminal activities, say, the hours spent in crimi- 
nal activities (e.g., selling illegal drugs). What variables would you consider in developing such a 
model? See if your model matches the one developed by the Nobel laureate economist Gary Becker. !8 
Controlled experiments in economics: On April 7, 2000, President Clinton signed into law a bill 
passed by both Houses of the U.S. Congress that lifted earnings limitations on Social Security recipi- 
ents. Until then, recipients between the ages of 65 and 69 who earned more than $17,000 a year would 
lose $1 worth of Social Security benefit for every $3 of income earned in excess of $17,000. How 
would you devise a study to assess the impact of this change in the law? Note: There was no income 
limitation for recipients over the age of 70 under the old law. 
The data presented in Table 1.7 were published in the March 1, 1984, issue of The Wall Street Jour- 
nal. They relate to the advertising budget (in millions of dollars) of 21 firms for 1983 and millions of 
impressions retained per week by the viewers of the products of these firms. The data are based on a 
survey of 4000 adults in which users of the products were asked to cite a commercial they had seen 
for the product category in the past week. 
a. Plot impressions on the vertical axis and advertising expenditure on the horizontal axis. 
b. What can you say about the nature of the relationship between the two variables? 
c. Looking at your graph, do you think it pays to advertise? Think about all those commercials shown 
on Super Bowl Sunday or during the World Series. 
Note: We will explore further the data given in Table 1.7 in subsequent chapters. 


18G. S. Becker, “Crime and Punishment: An Economic Approach,” Journal of Political Economy, vol. 76, 1968, pp. 169-217. 


1. (a) 
10. (b) 
19. (c) 


The Nature of Regression Analysis 
Table 1.7 Impact of Advertising Expenditure 
Impressions, Expenditure, 
Firm millions millions of 1983 dollars 

1. Miller Lite 32.1 50.1 

2. Pepsi 99.6 74.1 

3. Stroh’s WEZ 19.3 

4. Fed’l Express 21.9 22.9 

5. Burger King 60.8 82.4 

6. Coca-Cola 78.6 40.1 

7. McDonald's 92.4 185.9 

8. MCI i 50.7 26.9 

9. Diet Cola 21.4 20.4 

10. Ford 40.1 166.2 

11. Levi's 40.8 27.0 

12. Bud Lite 10.4 45.6 

13. ATT/Bell 88.9 154.9 

14. Calvin Klein 12.0 5.0 

15. Wendy’s 292 49.7 

16. Polaroid 38.0 26.9 

17. Shasta 10.0 S4 

18. Meow Mix I2 7.6 

19. Oscar Meyer 23.4 92 

20. Crest 71.1 32.4 

21. Kibbles ‘N Bits 4.4 6.1 
Source: http://lib.stat.cmu.edu/ DASL/Datafiles/tvadsdat.html. 

Key to Multiple Choice Questions 
2. (b) 3. (d) 4. (c) 5. (b) 6. (c) 7. (a) 8. (c) 9. (a) 
11. (b) 12. (a) 13. (b) 14. (c) 15. (a) 16. (d) WAC) 18. (b) 
20. (a) 21. (b) 22. (d) 23. (d) 24. (b) 25. (b) 
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CHAPTER | 


Two-Variable Regression 
Analysis: Some Basic Ideas 


In Chapter 1 we discussed the concept of regression in broad terms. In this chapter we approach the subject 
somewhat formally. Specifically, this and the following three chapters introduce the reader to the theory 
underlying the simplest possible regression analysis, namely, the bivariate, or two-variable, regression in 
which the dependent variable (the regressand) is related to a single explanatory variable (the regressor). 
This case is considered first, not because of its practical adequacy, but because it presents the fundamental 
ideas of regression analysis as simply as possible and some of these ideas can be illustrated with the aid of 
two-dimensional graphs. Moreover, as we shall see, the more general multiple regression analysis in which 
the regressand is related to one or more regressors is in many ways a logical extension of the two-variable 
case. 


2.1 A Hypothetical Example! 


As noted in Section 1.2, regression analysis is largely concerned with estimating “and/or predicting 
the (population) mean value of the dependent variable on the basis of the known or fixed values of the 
explanatory variable(s).” To understand this, consider the data given in Table 2.1. The data in the table refer 
to a total population of 60 families in a hypothetical community and their weekly income (X) and weekly 
consumption expenditure (Y), both in rupees. The 60 families are divided into 10 income groups (from 
Rs. 800 to Rs. 2600) and the weekly expenditures of each family in the various groups are as shown in the 
table. Therefore, we have 10 fixed values of X and the corresponding Y values against each of the X values: so 
to speak, there are 10 Y subpopulations. 


'The reader whose statistical knowledge has become somewhat rusty may want to freshen it up by reading the statistical 
appendix, Appendix A, before reading this chapter. 


The expected value, or expectation, or population mean of a random variable Y is denoted by the symbol £(Y). On the 
other hand, the mean value computed from a sample of values from the Y population is denoted as Y, read as Y bar. 


Two-Variable Regression Analysis: Some Basic Ideas 39 


Table 2.1 Weekly Family Income X, Rs. 


800 1000 1200 1400 1600 1800 2000 2200 2400 


Weekly family 550 650 790 800 1020 1100 1200 1350 1370 
consumption 600 700 840 930 1070 1150 1360 1370 #1450 
expenditure 650 740 900 950 1100 1200 1400 1400 1550 


Y, Rs 700 800 940 1030 1160 1300 1440 1520 1650 

750 850 980 1080 1180 1350 1450 1570 1750 

- 880 - 1130 1250 1400 — 1600 1890 

= 2 - 1150 - - — 1620 = 

Total 3250 4620 4450 7070 6780 7500 £6850 10430 £9660 
Conditional 

means of Y, 650 770 890 1010 1130 1250 1370 1490 1610 

E(Y|X) 


2600 


1500 
1520 
1750 
1780 
1800 
1850 
1910 


12110 


1730 


There is considerable variation in weekly consumption expenditure in each income group, which can be 
seen clearly from Figure 2.1. But the general picture that one gets is that, despite the variability of weekly 
consumption expenditure within each income bracket, on the average, weekly consumption expenditure 
increases as income increases. To see this clearly, in Table 2.1 we have given the mean, or average, weekly 
consumption expenditure corresponding to each of the 10 levels of income. Thus, corresponding to the 
weekly income level of Rs. 800, the mean consumption expenditure is Rs. 650, while corresponding to the 
income level of Rs. 2000, it is Rs. 1370. In all we have 10 mean values for the 10 sub-populations of Y. We 
call these mean values conditional expected values, as they depend on the given values of the (conditioning) 
variable X. Symbolically, we denote them as E(Y1X), which is read as the expected value of Y given the value 


of X (see also Table 2.2). 
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Figure 2.1 Conditional distribution of expenditure for various levels of income (data of Table 2.1). 
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Table 2.2 Conditional Probabilities p(Y | X; for the Data of Table 2.1 


800 1000 1200 1400 1600 1800 2000 2200 2400 £2600 
P(Y |X,)v 


Conditional 1/5. 1/6 1/5 1/7 1/6 1/6 1/5 177 1/6 1/7 
probabilities 1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7 
p(y|X;) 1/5 1/6 1/5 WZ 1/6 1/6 1/5 1/7 1/6 1/7 


1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7 
1/5 1/6 1/5 1/7 1/6 1/6 1/5 1/7 1/6 1/7 


= 1/6 ` Ua aae 6 . ‘Zia 1/7 

Be 2 £ 1/7 3 > 2 1/7 - UZ 
Conditional 650 770 890 1010 1130 1250 1370 1490 1610 1730 
means of Y 


It is important to distinguish these conditional expected values from the unconditional expected value 
of weekly consumption expenditure, E(Y). If we add the weekly consumption expenditures for all the 60 
families in the population and divide this number by 60, we get the number Rs.1212 (Rs. 72720/60), which 
is the unconditional mean, or expected, value of weekly consumption expenditure, E(Y); it is unconditional 
in the sense that in arriving at this number we have disregarded the income levels of the various families. 
Obviously, the various conditional expected values of Y given in Table 2.1 are different from the uncondi- 
tional expected value of Y of Rs. 1212. When we ask the question, “What is the expected value of weekly 
consumption expenditure of a family?” we get the answer Rs. 1212 (the unconditional mean). But if we ask 
the question, “What is the expected value of weekly consumption expenditure of a family whose monthly 
income is, say, Rs. 1400?” we get the answer Rs. 1010 (the conditional mean). To put it differently, if we ask 
the question, “What is the best (mean) prediction of weekly expenditure of families with a weekly income of 
Rs. 1400?” the answer would be Rs. 1010. Thus the knowledge of the income level may enable us to better 
predict the mean value of consumption expenditure than if we do not have that knowledge.” This probably is 
the essence of regression analysis, as we shall discover throughout this text. 

The dark circled points in Figure 2.1 show the conditional mean values of Y against the various X values. 
If we join these conditional mean values, we obtain what is known as the population regression line (PRL), 
or more generally, the population regression curve. More simply, it is the regressionof Y on X. The 
adjective “population” comes from the fact that we are dealing in this example with the entire population of 
60 families. Of course, in reality a population may have many families. 

Geometrically, then, a population regression curve is simply the locus of the conditional means of the 
dependent variable for the fixed values of the explanatory variable(s). More simply, it is the curve connecting 
the means of the subpopulations of Y corresponding to the given values of the regressor X. It can be depicted 
as in Figure 2.2. 

This figure shows that for each X (i.e., income level) there is a population of Y values (weekly consumption 
expenditures) that are spread around the (conditional) mean of those Y values. For simplicity, we are assuming 
that these Y values are distributed symmetrically around their respective (conditional) mean values. And the 
regression line (or curve) passes through these (conditional) mean values. 


3As shown in Appendix A, in general the conditional and unconditional mean values are different. 


^l am indebted to James Davidson on this perspective. See James Davidson, Econometric Theory, Blackwell Publishers, 
Oxford, U.K., 2000, p. 11. 


ŝin the present example the PRL is a straight line, but it could be a curve (see Figure 2.3). 
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Figure 2.2 Population regression line (data of Table 2.1) 


With this background, the reader may find it instructive to reread the definition of regression given in 
Section 1.2. 


2.2 The Concept of Population Regression Function (PRF) 


From the preceding discussion and Figures 2.1 and 2.2, it is clear that each conditional mean E(YIX;) is a 
function of X;, where X; is a given value of X. Symbolically, 


E(Y | Xi) = f(X%) (2.2.1) 


where f(X;) denotes some function of the explanatory variable X. In our example, E(Y1X;) is a linear function 
of X;. Equation 2.2.1 is known as the conditional expectation function (CEF) or population regression 
function (PRF) or population regression (PR) for short. It states merely that the expected value of the distri- 
bution of Y given X; is functionally related to X;. In simple terms, it tells how the mean or average response 
of Y varies with X. 

What form does the function f(X;) assume? This is an important question because in real situations we 
do not have the entire population available for examination. The functional form of the PRF is therefore an 
empirical question, although in specific cases theory may have something to say. For example, an economist 
might posit that consumption expenditure is linearly related to income. Therefore, as a first approximation or 
a working hypothesis, we may assume that the PRF E(Y1X;) is a linear function of X,, say, of the type 


E(Y | X;) = Pı + BX; (2.2.2) 
where 8, and £, are unknown but fixed parameters known as the regression coefficients; 8, and 8, are 
also known as intercept and slope coefficients, respectively. Equation 2.2.1 itself is known as the linear 
population regression function. Some alternative expressions used in the literature are linear population 
regression model or simply linear population regression. In the sequel, the terms regression, regression 
equation, and regression model will be used synonymously. 
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In regression analysis our interest is in estimating the PRFs like Equation 2.2.2, that is, estimating the 
values of the unknowns £; and B, on the basis of observations on Y and X. This topic will be studied in detail 
in Chapter 3. 


2.3 The Meaning of the Term Linear 


Since this text is concerned primarily with linear models like Eq. (2.2.2), it is essential to know what the term 
linear really means, for it can be interpreted in two different ways. 


Linearity in the Variables 


The first and perhaps more “natural” meaning of linearity is that the conditional expectation of Y is a linear 
function of X;, such as, for example, Eq. (2.2.2).° Geometrically, the regression curve in this case is a straight 
line. In this interpretation, a regression function such as E(Y | X;) = Bi + B2oX ? is not a linear function 
because the variable X appears with a power or index of 2. 


Linearity in the Parameters 


The second interpretation of linearity is that the conditional expectation of Y, E(Y1X;), is a linear function of the 
parameters, the 6’s; it may or may not be linear in the variable X.’ In this interpretation E(Y | .X;) = By + BoX 2 
is a linear (in the parameter) regression model. To see this, let us suppose X takes the value 3. Therefore, 
E(Y1X = 3) = B, + 9B), which is obviously linear in 6, and B,. All the models shown in Figure 2.3 are thus 
linear regression models, that is, models linear in the parameters. l l 

Now consider the model E(Y | X;) = By + Xi. Now suppose X = 3; then we obtain E(Y | X;) = 
Bı + 362, which is nonlinear in the parameter B,. The preceding model is an example of a nonlinear (in the 
parameter) regression model. We will discuss such models in Chapter 14. 

Of the two interpretations of linearity, linearity in the parameters is relevant for the development of the 
regression theory to be presented shortly. Therefore, from now on, the term “linear” regression will always 
mean a regression that is linear in the parameters; the B's (that is, the parameters) are raised to the first 
power only. It may or may not be linear in the explanatory variables, the X's. Schematically. we have Table 
2.3. Thus, E(Y1X;) = Bı + BX; which is linear both in the parameters and variable, is a LRM, and so is 
E(Y | Xi) = Bi + B2X?, which is linear in the parameters but nonlinear in variable X. 


6A function Y = f(X) is said to be linear in X if X appears with a power or index of 1 only (that is, terms such as X2, VX, and 
so on, are excluded) and is not multiplied or divided by any other variable (for example, X - Z or X/Z, where Z is another 
variable). If Y depends on X alone, another way to state that Y is linearly related to X is that the rate of change of Y with 
respect to X (i.e., the slope, or derivative, of Y with respect to X, dY/dX) is independent of the value of X. Thus, if Y = 4X, 


dY/dX = 4, which is independent of the value of X. But if Y = 4X*, dY/dX = 8X, which is not independent of the value taken 
by X. Hence this function is not linear in X. 


7A function is said to be linear in the parameter, say, 8,, if 8, appears with a power of 1 only and is not multiplied or divided 
by any other parameter (for example, 8, B2, 82/81, and so on). 
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Figure 2.3 Linear-in-parameter functions. 


Table 2.3 Linear Regression Models 


Model Linear in Parameters? Model Linear in Variables? 


Yes i No 
Yes LRM : LRM 
No NLRM NLRM 


Note: LRM = linear regression model 
NLRM = nonlinear regression model 


2.4 Stochastic Specification of PRF 


It is clear from Figure 2.1 that, as family income increases, family consumption expenditure on the average 
increases, too. But what about the consumption expenditure of an individual family in relation to its (fixed) 
level of income? It is obvious from Table 2.1 and Figure 2.1 that an individual family’s consumption expen- 
diture does not necessarily increase as the income level increases. For example, from Table 2.1 we observe 
that corresponding to the income level of Rs. 1000 there is one family whose consumption expenditure of 
Rs. 650 is less than the consumption expenditures of two families whose weekly income is only Rs. 800. But 
notice that the average consumption expenditure of families with a weekly income of Rs. 1000 is greater than 
the average consumption expenditure of families with a weekly income of Rs. 800 (Rs. 770 versus Rs. 650). 

What, then, can we say about the relationship between an individual family’s consumption expenditure 
and a given level of income? We see from Figure 2.1 that, given the income level of X,, an individual family’s 
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consumption expenditure is clustered around the average consumption of all families at that X;, that is, around 
its conditional expectation. Therefore, we can express the deviation of an individual Y, around its expected 
value as follows: 


u; = Y; — E(Y|X;) 
or 
Y; = E(Y | Xi) + ui a (2.4.1) 


where the deviation wu; is an unobservable random variable taking positive or negative values. Technically, u; 
is known as the stochastic disturbance or stochastic error term. 

How do we interpret Equation 2.4.1? We can say that the expenditure of an individual family, given 
its income level, can be expressed as the sum of two components: (1) E(Y1X;), which is simply the mean 
consumption expenditure of all the families with the same level of income. This component is known as the 
systematic, or deterministic, component, and (2) u;, which is the random, or nonsystematic, component. 
We shall examine shortly the nature of the stochastic disturbance term, but for the moment assume that it 
is a surrogate or proxy for all the omitted or neglected variables that may affect Y but are not (or cannot be) 
included in the regression model. 

If E(Y1X;) is assumed to be linear in X,, as in Eq. (2.2.2), Eq. (2.4.1) may be written as 


Y; = E(Y | Xi) +4; 
= Bi + BoX; GP uj; (2.4.2) 


Equation 2.4.2 posits that the consumption expenditure of a family is linearly related to its income plus the 
disturbance term. Thus, the individual consumption expenditures, given X = Rs. 800 (see Table 2.1), can be 
expressed as l 


Yı = 5S0 B, + B>(800) + uy 
Y2 = 600 = f; + B.(800) + u2 


Y; = 650 = fı + Bo(800) + u3 (2.4.3) 
Y4 = 700 = B; + B2(800) + u4 
Y; =Z IAS By -+ ß2(800) + us {v 


Now if we take the expected value of Eq. (2.4.1) on both sides, we obtain 
E(Y; | Xi) = EIEC | X;)] + E(u; | X;) 
= EY |X) + E(u; | X;) (2.4.4) 


where use is made of the fact that the expected value of a constant is that constant itself. Notice carefully that 
in Equation 2.4.4 we have taken the conditional expectation, conditional upon the given X’s. 
Since E(¥|X;) is the same thing as E(Y1X,), Eq. (2.4.4) implies that 


E(u; |X;) =0 : (2.4.5) 


Thus, the assumption that the regression line passes through the conditional means of Y (see Figure 2.2) 
implies that the conditional mean values of u; (conditional upon the given X's) are zero. 


See Appendix A for a brief discussion of the properties of the expectation operator E. Note that E(Y|X), once the value 
of X; is fixed, is a constant. 
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From the previous discussion, it is clear Eq. (2.2.2) and Eq. (2.4.2) are equivalent forms if E(uJX,) = 0.° 
But the stochastic specification in Eq. (2.4.2) has the advantage that it clearly shows that there are other 
variables besides income that affect consumption expenditure and that an individual family’s consumption 
expenditure cannot be fully explained only by the variable(s) included in the regression model. 


2.5 The Significance of the Stochastic Disturbance Term 


As noted in Section 2.4, the disturbance term u; is a surrogate for all those variables that are omitted from 
the model but that collectively affect Y. The obvious question is: Why not introduce these variables into the 
model explicitly? Stated otherwise, why not develop a multiple regression model with as many variables as 
possible? The reasons are many. 

1. Vagueness of theory: The theory, if any, determining the behavior of Y may be, and often is, incomplete. 
We might know for certain that weekly income X influences weekly consumption expenditure Y, but we 
might be ignorant or unsure about the other variables affecting Y. Therefore, u; may be used as a substitute for 
all the excluded or omitted variables from the model. 

2. Unavailability of data: Even if we know what some of the excluded variables are and therefore consider 
a multiple regression rather than a simple regression, we may not have quantitative information about these 
variables. It is a common experience in empirical analysis that the data we would ideally like to have often 
are not available. For example. in principle we could introduce family wealth as an explanatory variable in 
addition to the income variable to explain family consumption expenditure. But unfortunately, information 
on family wealth generally is not available. Therefore, we may be forced to omit the wealth variable from our 
model despite its great theoretical relevance in explaining consumption expenditure. 

3. Core variables versus peripheral variables: Assume in our consumption-income example that besides 
income X,, the number of children per family X,, sex X3, religion X4, education X;, and geographical region 
Xs also affect consumption expenditure. But it is quite possible that the joint influence of all or some of these 
variables may be so small and at best nonsystematic or random that as a practical matter and for cost consid- 
erations it does not pay to introduce them into the model explicitly. One hopes that their combined effect can 
be treated as a random variable u;.!° 

4. Intrinsic randomness in human behavior: Even if we succeed in introducing all the relevant variables 
into the model, there is bound to be some “intrinsic” randomness in individual Y’s that cannot be explained 
no matter how hard we try. The disturbances, the u’s, may very well reflect this intrinsic randomness. 

5. Poor proxy variables: Although the classical regression model (to be developed in Chapter 3) assumes 
that the variables Y and X are measured accurately, in practice the data may be plagued by errors of 
measurement. Consider, for example, Milton Friedman’s well-known theory of the consumption function.!! 
He regards permanent consumption (Y°) as a function of permanent income (XP). But since data on these 
variables are not directly observable, in practice we use proxy variables, such as current consumption (Y) and 
current income (X), which can be observable. Since the observed Y and X may not equal Y” and X”, there is the 
problem of errors of measurement. The disturbance term u may in this case then also represent the errors of 
measurement. As we will see in a later chapter, if there are such errors of measurement. they can have serious 
implications for estimating the regression coefficients, the B’s. 


°As a matter of fact, in the method of least squares to be developed in Chapter 3, it is assumed explicitly that E(u|X,) = 0. 
Seeisecas.2. 

10A further difficulty is that variables such as sex, education, and religion are difficult to quantify. 

"Milton Friedman, A Theory of the Consumption Function, Princeton University Press, Princeton, N.J., 1957. 
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6. Principle of parsimony: Following Occam’s razor,” we would like to keep our regression model as 
simple as possible. If we can explain the behavior of Y “substantially” with two or three explanatory variables 
and if our theory is not strong enough to suggest what other variables might be included, why introduce 
more variables? Let u; represent all other variables. Of course, we should not exclude relevant and important 
variables just to keep the regression model simple. 

7. Wrong functional form: Even if we have theoretically correct variables explaining a phenomenon 
and even if we can obtain data on these variables, very often we do not know the form of the functional 
relationship between the regressand and the regressors. Is consumption expenditure a linear (invariable) 
function of income or a nonlinear (invariable) function? If it is the former, Y, = B, + BX; + u; is the proper 
functional relationship between Y and X, but if it is the latter, Y, = B, + BX; + BX? + u; may be the correct 
functional form. In two-variable models the functional form of the relationship can often be judged from the 
scattergram. But in a multiple regression model, it is not easy to determine the appropriate functional form, 
for graphically we cannot visualize scattergrams in multiple dimensions. 

For all these reasons, the stochastic disturbances u; assume an extremely critical role in regression analysis, 
which we will see as we progress. 


2.6 The Sample Regression Function (SRF) 


By confining our discussion so far to the population of Y values corresponding to the fixed X’s, we have 
deliberately avoided sampling considerations (note that the data of Table 2.1 represent the population, not 
a sample). But it is about time to face up to the sampling problems, for in most practical situations what we 
have is but a sample of Y values corresponding to some fixed X’s. Therefore, our task now is to estimate the 
PRF on the basis of the sample information. : ` 

As an illustration, pretend that the population of Table 2.1 was not known to us and the only information 
we had was a randomly selected sample of Y values for the fixed X’s as given in Table 2.4. Unlike Table 
2.1, we now have only one Y value corresponding to the given X’s; each Y (given X;) in Table 2.4 is chosen 
randomly from similar Y’s corresponding to the same X; from the population of Table 2.1. 

The question is: From the sample of Table 2.4 can we predict the average weekly consumption expenditure 
Y in the population as a whole corresponding to the chosen X’s? In other words, can we estimate the PRF 
from the sample data? As the reader surely suspects, we may not be able to estimate the PRF “accurately” 
because of sampling fluctuations. To see this, suppose we draw another random sample from the population 
of Table 2.1, as presented in Table 2.5. 

Plotting the data of Tables 2.4 and 2.5, we obtain the scattergram given in Figure 2.4. In the scattergram 
two sample regression lines are drawn so as to “fit” the scatters reasonably well: SRF, is based on the first 
sample, and SRF, is based on the second sample. Which of the two regression lines represents the “true” 
population regression line? If we avoid the temptation of looking at Figure 2.1, which purportedly represents 
the PR, there is no way we can be absolutely sure that either of the regression lines shown in Figure 2.4 
represents the true population regression line (or curve). The regression lines in Figure 2.4 are known as the 
sample regression lines. Supposedly they represent the population regression line, but because of sampling 
fluctuations they are at best an approximation of the true PR. In general, we would get N different SRFs for 
N different samples, and these SRFs are not likely to be the same. 


"That descriptions be kept as simple as possible until proved inadequate,” The World of Mathematics, vol. 2, J. R. Newman 
(ed.), Simon & Schuster, New York, 1956, p. 1247, or, “Entities should not be multiplied beyond necessity,” Donald F. 
Morrison, Applied Linear Statistical Methods, Prentice Hall, Englewood Cliffs, N.J., 1983, p. 58. 
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Table 2.4 A Random Sample from the Table 2.5 A Random Sample from the 
population of Table 2.1 population of Table 2.1 
Y X Y X 
700 800 550 800 
650 1000 880 1000 
900 1200 900 1200 
950 1400 800 1400 
1100 1600 1180 1600 
1150 1800 1200 1800 
1200 2000 1450 2000 
1400 2200 1350 2200 
1550 2400 1450 2400 
1500 2600 1750 2600 
2000 


x First Sample (Table 2.4) 
= Second Sample (Table 2.5) 


Regression based on 
the second sample 
ies. 


Regression based on 
the first sample 


i æ a = 
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Weekly Income, Rs 


Figure 2.4 Regression lines based on two different samples. 


Now, analogously to the PRF that underlies the population regression line, we can develop the concept 
of the sample regression function (SRF) to represent the sample regression line. The sample counterpart of 
Eq. (2.2.2) may be written as 


> 


; = Bi + BX; (2.6.1) 


where Y is read as “Y-hat” or “Y-cap” 


Y; = estimator of E(YIX;) 
Ê = estimator of £; 


p> = estimator of B, 
Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or method that tells 
how to estimate the population parameter from the information provided by the sample at hand. A particular 
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numerical value obtained by the estimator in an application is known as an estimate.!? It should be noted that 
an estimator is random, but an estimate is nonrandom. (Why?) 
Now just as we expressed the PRF in two equivalent forms, Eq. (2.2.2) and Eq. (2.4.2), we can express the 
SRF in Eq. 2.6.1 in its stochastic form as follows: 
Y; = Bi + BoXi + ti (2.6.2) 


where, in addition to the symbols already defined, 2; denotes the (sample) residual term. Conceptually ù; is 
analogous to u, and can be regarded as an estimate of u;. It is introduced in the SRF for the same reasons as 
u; was introduced in the PRF. 

To sum up, then, we find our primary objective in regression analysis is to estimate the PRF 


Y; = Bi PX +; (2.4.2) 
on the basis of the SRF 
Y; = By + Bx; +i; (2.6.2) 


because more often than not our analysis is based upon a single sample from some population. But because 
of sampling fluctuations, our estimate of the PRF based on the SRF is at best an approximate one. This 
approximation is shown diagrammatically in Figure 2.5. 


SREP Hs oe 


Weekly consumption expenditure Rs 


X; 
Weekly Income, Rs 
Figure 2.5 Sample and population regression lines. 


For X = X; , we have one (sample) observation, Y = Y.. In terms of the SRF, the observed Y; can be expressed as 
Yy =f +a; (2.6.3) 

and in terms of the PRF, it can be expressed as 
Y; = E(Y |X) +u; (2.6.4) 


Now obviously in Figure 2.5 Y; overestimates the true E(Y1X;) for the X; shown therein. By the same token, 
for any X, to the left of the point A, the SRF will underestimate the trae PRE But the reader can readily see 
that such over- and underestimation is inevitable because of sampling fluctuations. 


134s noted in the Introduction, a hat above a variable will signify an estimator of the relevant population value. 
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The critical question now is: Granted that the SRF is but an approximation of the PRF, can we devise a rule 
or a method that will make this approximation as “close” as possible? In other words, how should the SRF 
be constructed so that Bi is as “close” as possible to the true B, and B> is as “close” as possible to the true B, 
even though we will never know the true £; and B,? 

The answer to this question will occupy much of our attention in Chapter 3. We note here that we can 
develop procedures that tell us how to construct the SRF to mirror the PRF as faithfully as possible. It is 
fascinating to consider that this can be done even though we never actually determine the PRF itself. 


2.7 Illustrative Examples 


We conclude this chapter with two examples. 


Example 2.1 Mean Hourly Wage by Education 


Table 2.6 gives data on the level of education (measured by the number of years of schooling), the mean 
hourly wages earned by people at each level of education, and the number of people at the stated level of 
education. Ernst Berndt originally obtained the data presented in the table, and he derived these data from 
the population survey conducted in May 1985.14 

Plotting the (conditional) mean wage against education, we obtain the picture in Figure 2.6. The regression 
curve in the figure shows how mean wages vary with the level of education; they generally increase with the 
level of education, a finding one should not find surprising. We will study in a later chapter how variables 
besides education can also affect the mean wage. 


Table 2.6 Mean Hourly Wage by Education 


Years of Schooling Mean Wage,$ Number of People 


6 4.4567 3 
7 5.7700 5 
8 5.9787 15 
9 7337 12 
10 7.3182 17 
11 6.5844 27 
12 7.8182 218 
13 7.8351 37 
14 11.0223 56 
15 10.6738 13 
16 10.8361 70 
17 13.6150 24 

18 13.5310 mol 
Total 528 


Source: Arthur S. Goldberger, Introductory Econometrics, Harvard University Press, Cambridge, Mass., 1998, Table 1.1, p. 5 (adapted). 


l4Ernst R. Berndt, The Practice of Econometrics: Classic and Contemporary, Addison Wesley, Reading, Mass., 1991. 
Incidentally, this is an excellent book that the reader may want to read to find out how econometricians go about doing 
research. 


50 Basic Econometrics 


i 


e Mean value 


Mean wage 
— — 
a © TS 


Oo 


6 8 10 12 14 16 18 
Education 


Figure 2.6 Relationship between mean wages and education. 


Example 2.2 Mathematics SAT Scores by Family Income 


Table 2.10 in Exercise 2.17 provides data on mean SAT (Scholastic Aptitude Test) scores on critical reading, 
mathematics, and writing for college-bound seniors based on 947,347 students taking the SAT examination 
in 2007. Plotting the mean mathematics scores on mean family income, we obtain the picture in Figure 2.7. 

Note: Because of the open-ended income brackets for the first and last income categories shown in Table 
2.10, the lowest average family income is assumed to be $5,000 and the highest average family income is 
assumed to be $150,000. 
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Figure 2.7 Relationship between mean mathematics SAT scores and mean family income. 
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As Figure 2.7 shows, the average mathematics score increases as average family income increases. Since 
the number of students taking the SAT examination is quite large, it probably represents the entire population 


of seniors taking the examination. Therefore, the regression line sketched in Figure 2.7 probably represents 
the population regression line. 


There may be several reasons for the observed positive relationship between the two variables. For 
example, one might argue that students with higher family income can better afford private tutoring for the 
SAT examinations. In addition, students with higher family income are more likely to have parents who are 
highly educated. It is also possible that students with higher mathematics scores come from better schools. 
The reader can provide other explanations for the observed positive relationship between the two variables. 


Summary and Conclusions 


1. The key concept underlying regression analysis is the concept of the conditional expectation function 
(CEF), or population regression function (PRF). Our objective in regression analysis is to find out 
how the average value of the dependent variable (or regressand) varies with the given value of the 
explanatory variable (or regressor). 

2. This book largely deals with linear PRFs, that is, regressions that are linear in the parameters. They 
may or may not be linear in the regressand or the regressors. 

3. For empirical purposes, it is the stochastic PRF that matters. The stochastic disturbance term u; plays 
a critical role in estimating the PRF. 

4. The PRF is an idealized concept, since in practice one rarely has access to the entire population of 
interest. Usually, one has a sample of observations from the population. Therefore, one uses the 
stochastic sample regression function (SRF) to estimate the PRF. How this is actually accomplished 
is discussed in Chapter 3. 


Multiple Choice Questions 


Choose the best alternative for each question 

1. Regression analysis is concerned with estimating 
a. The mean value of the dependent variable 
b. The mean value of the explanatory variable 
c. The mean value of the correlation coefficient 
d. The mean value of the fixed variable 

2. The locus of the conditional means of Y for the fixed values of X is the 
a. Conditional expectation function 
b. Intercept line 
c. Population regression line 
d. Linear regression line 

3. E(YIX;) =f (X) is referred to as 
a. Conditional expectation function 
b. Intercept line 
c. Population regression line 
d. Linear regression line 
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4. 


10. 


if. 


12. 


Linear regression model is 
a. Linear in explanatory variables but may not be linear in parameters 
b. Nonlinear in parameters and must be linear in variables 
c. Linear in parameters and must be linear in variables 
d. Linear in parameters and may not be linear in variables 
In Y, = B, + BX; + u;, u; can take values that are 
a Only positive 
b. Only negative 
c. Only zero 
d. Positive, negative or zero 
In Y, = E (YIX,) + u; the deterministic component is given by 
a. Y; 
b. E (YIX) 
GU 
d. E(Y\X;) + u; 
In Y, = E (YIX;) + u, the nonsystematic random component is 
a. Y,; 
b. E(Y1X;) 
c UY; 
d. E(Y\X;) + u; 
For a regression line that passes through the conditional means of Y, E(Y1X;) is 
a. Always a positive value 
b. Always a negative value 
c. Always zero 
d. Any of the above 
In Y, = B, + BX; + uj, ú; 
a. Represents the missing values of Y 
b. Acts as proxy for all the omitted variables that may affect Y 
c. Acts as proxy for important variable that affect Y 
d. Represent measurement errors 
The sample Regression line is at best an approximation of the true population regression. The statement 
a. is always true ` 
b. is always false 
c. may sometimes. be true sometimes false 
d. Nonsense statement 
Y, = B, + B2X; + u; represents 
a. Sample regression function 
b. Population regression function 
c. Nonlinear regression function 
d. Estimate of regression function 
Y; = Êi + BX; + ii; represents 
a. Sample regression function 
b. Population regression function 
c. Nonlinear regression function 
d. Estimate of regression function 


iS: 


14. 


15. 


16. 


ly 


18. 


19. 


20. 
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nY, = Â, + ÊX; Fûn Êi and Ê represent 
Fixed component 
Residual component 
Estimates 
Estimators 
In Y; = Êi + 6X; + ûi, ii; represents 
a. Fixed component 
b. Residual component 
c. Estimates 
d. Estimators 
In sample regression function, the observed Y, can be expressed as Y, = a Box; + u;. This statement is 
a. True 
b. False 
c. Depends on f, 
d. Depends on W 
The regression model includes a random error or disturbance term for a variety of reasons. Which of 
the following is NOT one of them? 
Individual Y observations are intrinsically random even if they are measured correctly 
Influence of variables other than X (Omitted variables) 
Unavailability of measurable data based on theory 
Approximation errors in the calculation of the least squares estimates 
In the simple linear regression model, the regression slope 
a. Indicates by how many percent Y increases, given a one percent increase in X 
b. When multiplied with the explanatory variable will give you the predicted Y 
c. Indicates by how many units Y increases, given a one unit increase in X 
d. Represents the elasticity of Y on X 
The statement that — There can be more than one SRF representing a population regression function is 
a. Always true 
b. Always false 
c. Sometimes true, sometimes false 
d. Nonsense statement 
Any regression equation written in its deviation form would not pass through the origin. This statement 
is 
a. Always true 
b. Always false 
c. Sometimes true, sometimes false 
d. Nonsense statement 
The slope coefficient from a regression of Y; on X; is the same as the slope coefficient from a regression 
of y; on x; where y; and x; are deviations from their mean value. This statement is 
. Always true 
b. Always false 
c. Sometimes true, sometimes false 
d. Nonsense statement 
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Exercises 


Questions 


251, 
22. 


23: 


2.4. 


23: 
2.6. 


Pre 


2.8: 


2.9: 


What is the conditional expectation function or the population regression function? 

What is the difference between the population and sample regression functions? Is this a distinction 
without difference? i 

What is the role of the stochastic error term u; in regression analysis? What is the difference between 
the stochastic error term and the residual, 7;? 

Why do we need regression analysis? Why not simply use the mean value of the regressand as its best 
value? 

What do we mean by a linear regression model? 

Determine whether the following models are linear in the parameters, or the variables, or both. Which 
of these models are linear regression models? 


Model Descriptive Title 
1 
a. Y; = Bi + B2 (x) + Uj Reciprocal 
i 
b. Y; = Bi + B2 In X; +u; Semilogarithmic 
c. In Y; = By + B2X; + ui inverse semilogarithmic 
d. In Y; = In Bi + B2 In X; +u; Logarithmic or double logarithmic 
1 
e. In Y; = Bi — f2 (x) + uj Logarithmic reciprocal 
i 


Note: In = natural log (i.e., log to the base e); u, is the stochastic disturbance term. We will study these models in Chapter 6. 


Are the following models linear regression models? Why or why not? 
a. Y; = eß!tP2Xi+ui 


i 


vee 1 + efit heXitui 


l 
e: In Y; = fı + fy (z)+u 


d. Y; = pi + (0.75 — pi) -d + u; 
e. Y; = pi + BX; + ui 
What is meant by an intrinsically linear regression model? If B, in Exercise 2.7d were 0.8, would it be 
a linear or nonlinear regression model? 
Consider the following nonstochastic models (i.e., models without the stochastic error term). Are they 
linear regression models? If not, is it possible, by suitable algebraic manipulations, to convert them into 
linear models? 
= l 

Bi + BX; 

By + BX; 

l 


vS 
1 + exp(—B; — £2X;) 


(ih, JG, 


by 
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2.10. You are given the scattergram in Figure 2.8 along with the regression line. What general conclusion do 


you draw from this diagram? Is the regression line sketched in the diagram a population regression line 
or the sample regression line? 
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Figure 2.8 Growth rates of real manufacturing wages and exports. Data are for 50 developing countries during 1970-90. 
Source: The World Bank, World Development Report 1995, p. 55. The original source is UNIDO data, World Bank data. 


2.11. From the scattergram given in Figure 2.9, what general conclusions do you draw? What is the economic 
theory that underlies this scattergram? (Hint: Look up any international economics textbook and read 
up on the Heckscher-Ohlin model of trade.) 

2.12. What does the scattergram in Figure 2.10 reveal? On the basis of this diagram, would you argue that 
minimum wage laws are good for economic well-being? 

2.13. Is the regression line shown in Figure I.3 of the Introduction the PRF or the SRF? Why? How would 
you interpret the scatterpoints around the regression line? Besides GDP, what other factors, or variables, 
might determine personal consumption expenditure? 


Empirical Exercises 


2.14. You are given the data in Table 2.7 for the United States for years 1980-2006. 

a. Plot the male civilian labor force participation rate against male civilian unemployment rate. 
Eyeball a regression line through the scatter points. A priori, what is the expected relationship 
between the two and what is the underlying economic theory? Does the scattergram support the 
theory? 

b. Repeat (a) for females. 

c. Now plot both the male and female labor participation rates against average hourly earnings (in 
1982 dollars). (You may use separate diagrams.) Now what do you find? And how would you 
rationalize your finding? 

d. Can you plot the labor force participation rate against the unemployment rate and the average 
hourly earnings simultaneously? If not, how would you verbalize the relationship among the three 
variables? 
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Figure 2.9 Skill intensity of exports and human capital endowment. Data are for 126 industrial and developing countries 
in 1985. Values along the horizontal axis are logarithms of the ratio of the country’s average educational at- 
tainment to its land area; vertical axis values are logarithms of the ratio of manufactured to primary-products 
exports. 


Source: World Bank, World Development Report 1995, p. 59. Original sources: Export data from United Nations Statistical Office COMTRADE database: 
education data from UNDP 1990; land data from the World Bank. 


Ratio of one year’s salary at 
minimum wage to GNP per capita ~ 
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GNP per capita (thousands of dollars) 
Figure 2.10 The minimum wage and GNP per capita. The sample consists of 17 developing countries. Years vary by 
country from 1988 to 1992. Data are in international prices. 
Source: World Bank, World Development Report 1995, p, 75. 
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Table 2.7 Labor Force Participation Data for US. for 1980-2006 


Year 


1980 
1981 
1982 
1983 
1984 
1985 
1986 
1987 
1988 
1989 
1990 
1991 
1992 
1998 
1994 
1995 
1996 
1997 
1998 
1999 
2000 
2001 
2002 
2003 
2004 
2005 
2006 


CLFPRM! 


77.40000 
77.00000 
76.60000 
76.40000 
76.40000 
76.30000 
76.30000 
76.20000 
76.20000 
76.40000 
76.40000 
75.80000 
75.80000 
75.40000 
75.10000 
75.00000 
74.90000 
75.00000 
74.90000 
74.70000 
74.80000 
74.40000 
74.10000 
73.50000 
73.30000 
73.30000 
73.50000 


CLFPRF? 
51.50000 
52.10000 
52.60000 
52.90000 
53.60000 
54.50000 
55.30000 
56.00000 
56.60000 
57.40000 
57.50000 
57.40000 
57.80000 
57.90000 
58.80000 
58.90000 
59.30000 
59.80000 
59.80000 
60.00000 
59.90000 
59.80000 
59.60000 
59.50000 
59.20000 
59.30000 
59.40000 


UNRM? 
6.900000 
7.400000 
9.900000 
9.900000 
7.400000 
7.000000 
6.900000 
6.200000 
5.500000 
5.200000 
5.700000 
7.200000 
7.900000 
7.200000 
6.200000 
5.600000 
5.400000 
4.900000 
4.400000 
4.100000 
3.900000 
4.800000 
5.900000 
6.300000 
5.600000 
5.100000 
4.600000 


UNRF4 


7.400000 
7.900000 
9.400000 
9.200000 
7.600000 
7.400000 
7.100000 
6.200000 
5.600000 
5.400000 
5.500000 
6.400000 
7.000000 
6.600000 
6.000000 
5.600000 
5.400000 
5.000000 
4.600000 
4.300000 
4.100000 
4.700000 
5.600000 
5.700000 
5.400000 
5.100000 
4.600000 


AHE82° 


7.990000 
7.880000 
7.860000 
7.950000 
7.950000 
7.910000 
7.960000 
7.860000 
7.810000 
7.750000 
7.660000 
7.580000 
7.550000 
7.520000 
7.530000 
7.530000 
7.570000 
7.680000 
7.890000 
8.000000 
8.030000 
8.110000 
8.240000 
8.270000 
8.230000 
8.170000 
8.230000 


AHE® 


6.840000 
7.430000 
7.860000 
8.190000 
8.480000 
8.730000 
8.920000 
9.130000 
9.430000 
9.800000 
10.190000 
10.500000 
10.760000 
11.030000 
11.320000 
11.640000 
12.030000 
12.490000 
13.000000 
13.470000 
14.000000 
14.530000 
14.950000 
15.350000 
15.670000 
16.110000 
16.730000 


Table citations below refer to the source document. 
'CLFPRM, Civilian labor force participation rate, male (%), Table B-39, p. 277. 
*CLFPRF, Civilian labor force participation rate, female (%), Table B-39, p. 277. 
3UNRM, Civilian unemployment rate, male (%) Table B-42, p. 280. 

4UNRF, Civilian unemployment rate, female (%) Table B-42, p. 280. 
5AHE82, Average hourly earnings (1982 dollars), Table B-47, p. 286. 
®AHE, Average hourly earnings (current dollars), Table B-47, p. 286. 


Source: Economic Report of the President, 2007. 
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2.15. Table 2.8 gives data on expenditure on food and total expenditure, measured in rupees, for a sample of 


55 rural households from India. (In early 2000, a U.S. dollar was about 40 Indian rupees.) 


a. Plot the data, using the vertical axis for expenditure on food and the horizontal axis for total expen- 


diture, and sketch a regression line through the scatterpoints. 


b. What broad conclusions can you draw from this example? 
c. A priori, would you expect expenditure on food to increase linearly as total expenditure increases 


regardless of the level of total expenditure? Why or why not? You can use total expenditure as a 
proxy for total income. 
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Tabie 2.8 Food and Total Expenditure (Rupees) 


Food Total Food Total 
Observation Expenditure Expenditure Observation Expenditure Expenditure 
1 217.0000 382.0000 29 390.0000 655.0000 
2 196.0000 388.0000 30 385.0000 662.0000 
3 303.0000 391.0000 31 470.0000 663.0000 
4 270.0000 415.0000 32 322.0000 677.0000 
5 325.0000 456.0000 2) 540.0000 680.0000 
6 260.0000 460.0000 34 433.0000 690.0000 
ye 300.0000 472.0000 35 295.0000 695.0000 
8 325.0000 478.0000 36 340.0000 695.0000 
gi 336.0000 494.0000 37 500.0000 695.0000 
10 345.0000 516.0000 38 450.0000 720.0000 
11 325.0000 525.0000 39 415.0000 721.0000 
12 362.0000 554.0000 40 540.0000 730.0000 
13 315.0000 575.0000 41 360.0000 731.0000 
14 355.0000 579.0000 42 450.0000 733.0000 
is 325.0000 585.0000 43 395.0000 745.0000 
16 370.0000 586.0000 44 430.0000 751.0000 
17 390.0000 590.0000 45 332.0000 752.0000 
18 420.0000 608.0000 46 397.0000 752.0000 
19 410.0000 610.0000 47 446.0000 769.0000 
20 383.0000 616.0000 48 480.0000 773.0000 
21 315.0000 618.0000 49 352.0000 773.0000 
22 267.0000 623.0000 50 410.0000 775.0000 
23 420.0000 627.0000 51 380.0000 785.0000 
24 300.0000 630.0000 52 610.0000 788.0000 
25 410.0000 635.0000 53 530.0000 790.0000 
26 220.0000 640.0000 54 360.0000 795.0000 
27 403.0000 648.0000 55 305.0000 801.0000 
28 350.0000 650.0000 


Source: Chandan Mukherjee, Howard White, and Marc Wuyts, Econometrics and Data Analysis for Developing Countries. Routledge. NewYork, 1998, p. 457. 


2.16. Table 2.9 gives data on mean Scholastic Aptitude Test (SAT) scores for college-bound seniors for 
1972-2007. These data represent the critical reading and mathematics test scores for both male and 
female students. The writing category was introduced in 2006. Therefore, these data are not included. 

a. Use the horizontal axis for years and the vertical axis for SAT scores to plot the critical reading and 
math scores for males and females separately. 

b. What general conclusions do you draw from these graphs? 

c. Knowing the critical reading scores of males and females, how would you go about predicting their 
math scores? 

d. Plot the female math scores against the male math scores. What do you observe? 


Two-Variable Regression Analysis: Some Basic Ideas 


Table 2.9 Total Group Mean SAT Reasoning Test Scores: College-Bound Seniors, 1972—2007 


Year 


1972 
1973 
1974 
1975 
1976 
1977 
1978 
1979 
1980 
1981 
1982 
1983 
1984 
1985 
1986 
1987 
1988 
1989 
1990 
1991 
1992 
1993 
1994 
1995 
1996 
1997 
1998 
11999 
2000 
2001 
2002 
2003 
2004 
2005 
2006 
2007 


Note: For 1972-1986 a formula was applied to the original mean and standard deviation to convert the mean to the recentered scale. For 
1987-1995 individual student scores were converted to the recentered scale and then the mean was recomputed. From 1996-1999, nearly 
all students received scores on the recentered scale. Any score on the original scale was converted to the recentered scale prior to computing 
the mean. From 2000-2007, all scores are reported on the recentered scale. 


Source: College Board, 2007. 


Critical Reading = . 


Male 


531 
523 
524 
SUS 
511 
509 
Sa 
509 
506 
508 
509 
508 
511 
514 
515 
DUZ 
512 
510 
505 
503 
504 
504 
501 
505 
507 
507 
509 
509 
507 
509 
507 
512 
512 
513 
505 
504 


Female 


529 
521 
520 
509 
508 
505 
503 
501 
498 
496 
499 
498 
498 
503 
504 
502 
499 
498 
496 
495 
496 
497 
497 
502 
503 
503 
502 
502 
504 
502 
502 
503 
504 
505 
502 
502 


Mathematics 


Male 
527 
525 
524 
518 
520 
520 
517 
516 
515 
516 
516 
516 
518 
522 
523 
523 
521 
523 
524 
520 
521 
524 
523 
525 
527 
530 
531 
531 
533 
533 
534 
557 
537 
538 
536 
533 


Female 


489 
489 
488 
479 
475 
474 
474 
473 
473 
473 
473 
474 
478 
480 
479 
481 
483 
482 
483 
482 
484 
484 
487 
490 
492 
494 


496 


495 
498 
498 
500 
503 
501 
504 
502 
499 


518 
515 
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2.17. Table 2.10 presents data on mean SAT reasoning test scores classified by income for three kinds of 
tests: critical reading, mathematics, and writing. In Example 2.2, we presented Figure 2.7, which 
plotted mean math scores on mean family income. 

a. Refer to Figure 2.7 and prepare a similar graph relating average critical reading scores to average 
family income. Compare your results with those shown in Figure 2.7. 

b. Repeat (a), relating average writing scores to average family income and compare your results with 
the other two graphs. 

c. Looking at the three graphs, what general conclusion can you draw? 


Table 2.10 SAT Reasoning Test Classified by Family Income 


Family Niimibenot Critical Reading Mathematics Writing 

Income ($) Test Takers Mean SD Mean SD Mean SD 
<10,000 40610 427 107 451 TE |22 423 104 
10000-20000 72745 453 106 472 113 446 102 
20000-30000 61244 454 102 465 107 444 97 
30000—40000 83685 476 103 485 106 466 98 
40000-50000 75836 489 103 486 105 477 99 
50000-60000 80060 497 102 504 104 486 98 
60000-70000 75763 504 102 511 103 493 98 
70000-80000 81627 508 101 516 103 498 98 
80000-100000 130752 520 102 529 104 510 100 
>100000 245025 544 105 556 107 537 103 


Source: College Board, 2007. College-Bound Seniors, Table 11. 


Key to Multiple Choice Questions 


1. (a) 2. ic) 3. (a) 4 (d) 5. (d) 6. (b) PC) 8. (c) 9. (b) 
10. (a) 11. (b) 12. (a) 13. (d) 14. (b) 15. (b) 16. (d) Lec) 18. (a) 
19. (b) 20. (a) 


CHAPTER 


Two-Variable Regression 
Model: The Problem of 
Estimation 


As noted in Chapter 2, our first task is to estimate the population regression function (PRF) on the basis of the 
sample regression function (SRF) as accurately as possible. In Appendix A we have discussed two generally 
used methods of estimation: (1) ordinary least squares (OLS) and (2) maximum likelihood (ML). By and 
large, it is the method of OLS that is used extensively in regression analysis primarily because it is intuitively 
appealing and mathematically much simpler than the method of maximum likelihood. Besides, as we will 
show later, in the linear regression context the two methods generally give similar results. 


3.1 The Method of Ordinary Least Squares 


The method of ordinary least squares is attributed to Carl Friedrich Gauss, a German mathematician. 
Under certain assumptions (discussed in Section 3.2), the method of least squares has some very attractive 
statistical properties that have made it one of the most powerful and popular methods of regression analysis. To 
understand this method, we first explain the least-squares principle. 

Recall the two-variable PRF: 


Y; = Bi + 2X; + úi (2.4.2) 

However, as we noted in Chapter 2, the PRF is not directly observable. We estimate it from the SRF: 
Y, = Îi + ÊXi +a; - (2.6.2) 
=f +û; (2.6.3) 


where Yy; is the estimated (conditional mean) value of Y,. 
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But how is the SRF itself determined? To see this, let us proceed as follows. First, express Eq. 2.6.3 as 
t= 1; —Y 

=Y; — Êi — BX 

which shows that the i; (the residuals) are simply the differences between the actual and estimated Y values. 

Now given n pairs of observations on Y and X, we would like to determine the SRF in such a manner that 

it is as close as possible to the actual Y. To this end, we may adopt the following criterion: Choose the SRF 


in such a way that the sum of the residuals }° 4; = ) (Y; — Y;) is as small as possible. Although intuitively 
appealing, this is not a very good criterion, as can be seen in the hypothetical scattergram shown in Figure 3.1. 


(3.1.1) 


Y 


Figure 3.1 Least-squares criterion. 


If we adopt the criterion of minimizing }_ ;, Figure 3.1 shows that the residuals #2 and 3 as well as the 
residuals 2; and ù4 receive the same weight in the sum (#7; + #2 + 3 + %4), although the first two residuals 
are much closer to the SRF than the latter two. In other words, all the residuals receive equal importance no 
matter how close or how widely scattered the individual observations are from the SRF. A consequence of 
this is that it is quite possible that the algebraic sum of the &; is small (even zero) although the ĉ; are widely 
scattered about the SRF. To see this, let ĉ1, #2, 43, and 4 in Figure 3.1 assume the values of 10, —2, +2, and 
-10, respectively. The algebraic sum of these residuals is zero although ; and ĝ4 are scattered more widely 
around the SRF than @ and #3. We can avoid this problem if we adopt the least-squares criterion, which 
states that the SRF can be fixed in such a way that 


DH =a - FY 


= O; - Bi — ÊX’ (3.1.2) 


is as small as possible, where ù? are the squared residuals. By squaring ĉ;, this method gives more weight 
to residuals such as 4; and #4 in Figure 3.1 than the residuals 2 and 3. As noted previously, under the 
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minimum ` ù; criterion, the sum can be small even though the i; are widely spread about the SRF. But this 
is not possible under the least-squares procedure, for the larger the iz, (in absolute value), the larger the >> ii. 
A further justification for the least-squares method lies in the fact that the estimators obtained by it have some 
very desirable statistical properties, as we shall see shortly. 


Table 3.1 Experimental Determination of the SRF 


Y; Xt Vii ty; a7, Vo; ûz; a3; 
(1) (2) (3) (4) (5) (6) (7) (8) 
4 1 2.929 1.071 1.147 4 0 0 
5 4 7.000 —2.000 4.000 7 —2 4 
7 5 8.357 —1.357 1.841 8 —1 1 
12 6 9.714 2.286 5.226 9 3 9 


Sum: 28 16 0.0 12.214 0 14 


Notes: Ýi; = 1.572 + 1.357X, (i.e., By = 1.572 and Ê, = 1.357) 
Pz; = 3.0 + 1.0X; (i.e., Êi = 3 and A; = 1.0) 
úy = (Y; — Ni) 
üz = (Y; — Yai) 


It is obvious from Eq. 3.1.2 that 


Ss it; = (Bi, bo) (3.1.3) 


that is, the sum of the squared residuals is some function of the estimators ĝi and bo. For any given set of 
data, choosing different values for Â; and f» will give different i’s and hence different values of }° a. To 
see this clearly, consider the hypothetical data on Y and X given in the first two columns of Table 3.1. Let 
us now conduct two experiments. In experiment 1, let Â; = 1.572 and f) = 1.357 (let us not worry right 
now about how we got these values; say, it is just a guess).! Using these ĝ values and the X values given in 
column (2) of Table 3.1, we can easily compute the estimated Y, given in column (3) of the table as Ge (the 
subscript | is to denote the first experiment). Now let us conduct another experiment, but this time using 
the values of Bi = 3 and bo = |. The estimated values of Y, from this experiment are given as P»; in column 
(6) of Table 3.1. Since the ĝ values in the two experiments are different, we get different values for the 
estimated residuals, as shown in the table; ú; are the residuals from the first experiment and ûz; from the 
second experiment. The squares of these residuals are given in columns (5) and (8). Obviously, as expected 
from Eq. 3.1.3, these residual sums of squares are different since they are based on different sets of B values. 

Now which sets of Ê values should we choose? Since the Ê values of the first experiment give us a lower 
$ ú? (= 12.214) than that obtained from the B values of the second experiment (= 14), we might say that the 
B’s of the first experiment are the “best” values. But how do we know? For, if we had infinite time and infinite 
patience, we could have conducted many more such experiments, choosing different sets of 5” ù? each time 
and comparing the resulting $` i? and then choosing that set of B values that gives us the least possible value 
of X a? assuming of course that we have considered all the conceivable values of 8, and £». But since time, 
and certainly patience, are generally in short supply, we need to consider some shortcuts to this trial-and-error 
process. Fortunately, the method of least squares provides us such a shortcut. The principle or the method of 
least squares chooses Bi and Bo in such a manner that, for a given sample or set of data, $- ù? is as small as 


lFor the curious, these values are obtained by the method of least squares, discussed shortly. See Eqs. (3.1.6) and (3.1.7). 
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possible. In other words, for a given sample, the method of least squares provides us with unique estimates of 
B, and $, that give the smallest possible value of 5- a2. How is this accomplished? This is a straightforward 
exercise in differential calculus. As shown in Appendix 3A, Section 3A. 1, the process of differentiation yields 


the following equations for estimating 6, and £,: 


> Y =nbi +Â} Xi (3.1.4) 


Trx =ĝÂ} X+) X (3.1.5) 


where n is the sample size. These simultaneous equations are known as the normal equations. 
Solving the normal equations simultaneously, we obtain 


PED E2 nP DEDE 
ape (EK) 
N xii 
Bou 


where ¥ and F are the sample means of X and Y and where we define x; = (X; — X) and y; = (Y; — Y). 
Henceforth, we adopt the convention of letting the lowercase letters denote deviations from mean values. 


re, GW ite ae deer 
ny X? —(2 Xi). 5 (3.1.7) 
=f- X 


Bi 


The last step in Eq. 3.1.7 can be obtained directly from Eq. (3.1.4) by simple algebraic manipulations. 
Incidentally, note that, by making use of simple algebraic identities, formula (3.1.6) for estimating 6, can 
be alternatively expressed as {~ 


iYi 
= Mpe (3.1.8) 


*Note 1: Yx =P (Xi — X)? = OX? -25X Ř + OK? =X? - 2X Xi +5 X?, since X is a constant. Further 
noting that J` X; = nX and J X? = nX?, since X is a constant, we finally get © x? = © X2 — nX2. 

Note 2: Y xy= Oxi -Y) = VV -Y OK = Vx; -Y(X — X) = x Yj, since ý is a constant and since the 
sum of deviations of a variable from its mean value [e.g., )°(X; — X)] is always zero. Likewise, Da SOG — Y) =0. 
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The estimators obtained previously are known as the least-squares estimators, for they are derived from 
the least-squares principle. Note the following numerical properties of estimators obtained by the method 
of OLS: “Numerical properties are those that hold as a consequence of the use of ordinary least squares, 
regardless of how the data were generated.’ Shortly, we will also consider the statistical properties of 
OLS estimators, that is, properties “that hold only under certain assumptions about the way the data were 
generated.” (See the classical linear regression model in Section 3.2.) 


I. The OLS estimators are expressed solely in terms of the observable (i.e., sample) quantities (i.e., X and 
Y). Therefore, they can be easily computed. 

II. They are point estimators; that is, given the sample, each estimator will provide only a single (point) 
value of the relevant population parameter. (In Chapter 5 we will consider the so-called interval 
estimators, which provide a range of possible values for the unknown population parameters.) 

HI. Once the OLS estimates are obtained from the sample data, the sample regression line (Figure 3.1) can 
be easily obtained. The regression line thus obtained has the following properties: 
1. It passes through the sample means of Y and X. This fact is obvious from Eq. (3.1.7), for the latter 
can be written as Y = ĝi + BX, which is shown diagrammatically in Figure 3.2. 
2. The mean value of the estimated Y = Y, is equal to the mean value of the actual Y for 


f, = ĝi + BX; 
= (¥ — BX) + BX; (3.1.9) 
=¥+ B(x; —X) 


X 
Figure 3.2 Diagram showing that the sample regression line passes through the sample mean values of Y and X. 


3Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 
993A pes: 
‘Ibid. 
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Summing both sides of this last equality over the sample values and dividing through by the sample 
size n gives 
a (3.1.10) 


where use is made of the fact that $ (X; — X) = 0. (Why?) — 
3. The mean value of the residuals ĉ; is zero. From Appendix 3A, Section 3A.1, the first equation is 


-2X 0; — Bi — boXi) = 0 
But since ù; = Y; — i Boker the preceding equation reduces to —2 }7 ù; = 0, whence a=08 
As a result of the preceding property, the sample regression 


Y; = Bi + ÊX; + û; (2.6.2) 


can be expressed in an alternative form where both Y and X are expressed as deviations from their 
mean values. To see this, sum (2.6.2) on both sides to give 


D = np; + Bod Xi + oa: 
= np + bp >) Xi since > 4 =0 
Dividing Equation 3.1.11 through by n, we obtain 
Ý = ĝi + bX (3.1.12) 
which is the same as Eq. (3.1.7). Subtracting Equation 3.1.12 from Eq. (2.6.2), we obtain 


G) 


Y, —¥ = (X: —X) +4; 


or 

Yi = Boxi + Ui; (3113) 
where y, and x,, following our convention, are deviations from their respective (sample) mean 
values. ~ 


Equation 3.1.13 is known as the deviation form. Notice that the intercept term , is no longer 
present in it. But the intercept term can always be estimated by Eq. (3.1.7), that is, from the fact 
that the sample regression line passes through the sample means of Y and X. An advantage of the 
deviation form is that it often simplifies computing formulas. 

In passing, note that in the deviation form, the SRF can be written as 


Jı = Box; (3.1.14) 


whereas in the original units of measurement it was Ý, = Bi + ÊX i, as shown in Eq. (2.6.1). 


Note that this result is true only when the regression model has the intercept term 8, in it. As Appendix 6A, Sec. 6A.1 
shows, this result need not hold when £8; is absent from the model. 


®This result also requires that the intercept term 8, be present in the model (see Appendix 6A, Sec. 6A.1). 
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4. The residuals &, are uncorrelated with the predicted Y, This statement can be verified as follows: 
using the deviation form, we can write 


= ĝ San - ft << (3.1.15) 


where use is made of the fact that Bo =. Xi joer! 
5. The residuals 2, are uncorrelated with X; that is, }> ú, X; = 0. This fact follows from Eq. (2) in 
Appendix 3A, Section 3A.1. 


3.2 The Classical Linear Regression Model: The Assumptions Underlying 
~ the Method of Least Squares 


If our objective is to estimate B, and B, only, the method of OLS discussed in the preceding section will 
suffice. But recall from Chapter 2 that in regression analysis our objective is not only to obtain ĝi and bo but 
also to draw inferences about the true B, and 8. For example, we would like to know how close Bi and bo 
are to their counterparts in the population or how close Y; is to the true E(Y1X;). To that end, we must not 
only specify the functional form of the model, as in Eq. (2.4.2), but also make certain assumptions about the 
manner in which Y, are generated. To see why this requirement is needed, look at the PRF: Y, = 8, + 8X; + u; 
It shows that Y, depends on both X; and u;. Therefore, unless we are specific about how X, and u; are created or 
generated, there is no way we can make any statistical inference about the Y, and also, as we shall see, about 
B, and B,. Thus, the assumptions made about the X; variable(s) and the error term are extremely critical to the 
valid interpretation of the regression estimates. 

The Gaussian, standard, or classical linear regression model (CLRM), which is the cornerstone of 
most econometric theory, makes 7 assumptions.’ We first discuss these assumptions in the context of the 
two-variable regression model; and in Chapter 7 we extend them to multiple regression models, that is, 
models in which there is more than one regressor. 


Assumption 1 


Linear Regression Model: The regression model is linear in the parameters, though it may or may not 
be linear in the variables. That is the regression model as shown in Eq. (2.4.2): 

Y; = Bit B2 Xi + úi (2.4.2) 
As will be discussed in Chapter 7, this model can be extended to include more explanatory variables. 


We have already discussed model (2.4.2) in Chapter 2. Since linear-in-parameter regression models are the 
starting point of the CLRM, we will maintain this assumption for most of this book.® Keep in mind that the 
regressand Y and the regressor X may be nonlinear, as discussed in Chapter 2. 


“It is classical in the sense that it was developed first by Gauss in 1821 and since then has served as a norm or a standard 
against which may be compared the regression models that do not satisfy the Gaussian assumptions. 

SHowever, a brief discussion of nonlinear-in-parameter regression models is given in Chapter 14 for the benefit of more 
advanced students. 
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ne EE EEE eee 
Assumption 2 


Fixed X Values or X Values Independent of the Error Term: Values taken by the regressor X may be 
considered fixed in repeated samples (the case of fixed regressor) or they may be sampled along with the 
dependent variable Y (the case of stochastic regressor). In the latter case, it is assumed that the X variable(s) 


and the error term are independent, that is, cov (X;, u) = 0. 
DO O cial taney M E rn 


This can be explained in terms of our example given in Table 2.1 (page 39). Consider the various Y popula- 
tions corresponding to the levels of income shown in the table. Keeping the value of income X fixed, say, 
at level $80, we draw at random a family and observe its weekly family consumption Y as. say, $60. Still 
keeping X at $80, we draw at random another family and observe its Y value at $75. In each of these drawings 
(i.e., repeated sampling), the value of X is fixed at $80. We can repeat this process for all the X values shown 
in Table 2.1. As a matter of fact, the sample data shown in Tables 2.4 and 2.5 were drawn in this fashion. 

Why do we assume that the X values are nonstochastic? Given that, in most social sciences, data usually 
are collected randomly on both the Y and X variables, it seems natural to assume the opposite—that the X 
variable, like the Y variable, is also random or stochastic. But initially we assume that the X variable(s) is 
nonstochastic for the following reasons: 

First, this is done initially to simplify the analysis and to introduce the reader to the complexities of 
regression analysis gradually. Second, in experimental situations it may not be unrealistic to assume that the 
X values are fixed. For example, a farmer may divide his land into several parcels and apply different amounts 
of fertilizer to these parcels to see its effect on crop yield. Likewise, a department store may decide to offer 
different rates of discount on a product to see its effect on consumers. Sometimes we may want to fix the 
X values for a specific purpose. Suppose we are trying to find out the average weekly earnings of workers 
(Y) with various levels of education (X), as in the case of the data given in Table 2.6. In this case, the X 
variable can be considered fixed or nonrandom. Third, as we show in Chapter 13, even if the X variables are 
stochastic, the statistical results of linear regression based on the case of fixed regressors are also valid when 
the X’s are random, provided that some conditions are met. One condition is that regressor X and the error 
term u are independent. As James Davidson notes, “. . . this model [i.e., stochastic regressors] “mimics’ the 
fixed regressor model, and . . . many of the statistical properties of least squares in the fixed regressor model 
continue to hold.” 

For all these reasons, we will first discuss the (fixed-regressor) CLRM in considerable detail. However, 
in Chapter 13 we wiil discuss the case of stochastic regressors in some detail and point_out the occasions 
where we need to consider the stochastic regressor models. Incidentally, note that if the X variable(s) is 
stochastic, the resulting model is called the neo-classical linear regression model (NLRM),'° in contrast to 
the CLRM, where the X’s are treated as fixed or nonrandom. For discussion purposes, we will call the former 
the stochastic regressor model and the latter the fixed regressor model. 


———— ee. 
Assumption 3 


Zero Mean Value of Disturbance u;: Given the value of X, the mean, or expected, value of the random 
disturbance term u; is zero. Symbolically, we have 


ee E(u;| X) = 0 l (3.2.1) 
Or, if X is nonstochastic, 

E(u) = 0 
eee 
9james Davidson, Econometric Theory, Blackwell Publishers, U.K., 2000, p. 10. 
10A term due to Arthur S. Goldberger, A Course in Econometrics, Harvard University Press, Cambridge, MA, 1991, p. 264. 
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Assumption 3 states that the mean value of u conditional upon the given X is zero. Geometrically, this 
assumption can be pictured as in Figure 3.3, which shows a few values of the variable X and the Y populations 
associated with each of them. As shown, each Y population corresponding to a given X is distributed around 
its mean value (shown by the circled points on the PRF), with some Y values above the mean and some below 
it. The distances above and below the mean values are nothing but the u, Equation 3.2.1 requires that the 
average or mean value of these deviations corresponding to any given X should be zero. 


Y 
@) Mean 


PRF: Y; = B, + BX; 


Figure 3.3 Conditional distribution of the disturbances x; 


This assumption should not be difficult to comprehend in view of the discussion in Section 2.4 (see 
Eq. [2.4.5]). Assumption 3 simply says that the factors not explicitly included in the model, and therefore 
subsumed in u, do not systematically affect the mean value of Y; in other words, the positive u, values cancel 
out the negative u; values so that their average or mean effect on Y is zero.” 

In passing, note that the assumption E(u,lX,) = 0 implies that E(Y1X,) = B, + B- X, (Why?) Therefore, the 
two assumptions are equivalent. 

It is important to point out that Assumption 3 implies that there is no specification bias or specification 
error in the model used in empirical analysis. In other words, the regression model is correctly specified. 
Leaving out important explanatory variables, including unnecessary variables, or choosing the wrong 
functional form of the relationship between the Y and X variables are some examples of specification error. 
We will discuss this topic in considerable detail in Chapter 13. 

Note also that if the conditional mean of one random variable given another random variable is zero, the 
covariance between the two variables is zero and hence the two variables are uncorrelated. Assumption 3 
therefore implies that X; and u; are uncorrelated.’ 


"For a more technical reason why Assumption 3 is necessary see E. Malinvaud, Statistical Methods of Econometrics, Rand 
McNally, Chicago, 1966, p. 75. See also Exercise 3.3. 

12The converse, however, is not true because correlation is a measure of linear association only. That is, even if X; and u, are 
uncorrelated, the conditional mean of u, given X; may not be zero. However, if X, and u, are correlated, E(u|X,) must be non- 
zero, violating Assumption 3. We owe this point to Stock and Watson. See James H. Stock and Mark W. Watson, Introduction 
to Econometrics, Addison-Wesley, Boston, 2003, pp. 104-105. 
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The reason for assuming that the disturbance term u and the explanatory variable(s) X are uncorrelated is 
simple. When we expressed the PRF as in Eq. (2.4.2), we assumed that X and u (which represent the influence 
of all omitted variables) have separate (and additive) influences on Y. But if X and u are correlated, it is not 
possible to assess their individual effects on Y. Thus, if X and u are positively correlated, X increases when u 
increases and decreases when u decreases. Similarly, if X and u are negatively correlated, X increases when u 
decreases and decreases when u increases. In situations like this it is quite possible that the error term actually 
includes some variables that should have been included as additional regressors in the model. This is why 
Assumption 3 is another way of stating that there is no specification error in the chosen regression mode]. 


Assumption 4 


Homoscedasticity or Constant Variance of u;: The variance of the error, or disturbance, term is the same 
regardless of the value of X. Symbolically, 


var (u) = E[u; — E(u;| Xi)? 
= E(u?|X)), because of Assumption 3 
= E(u), if X; are nonstochastic 
So (3.2.2) 


where var stands for variance. 


Equation 3.2.2 states that the variance of u, for each X; (i.e., the conditional variance of u;) is some positive 
constant number equal to o”. Technically, Eq. (3.2.2) represents the assumption of homoscedasticity, or 
equal (homo) spread (scedasticity) or equal variance. The word comes from the Greek verb skedanime, 
which means to disperse or scatter. Stated differently, Eq. (3.2.2) means that the Y populations corresponding 
to various X values have the same variance. Put simply, the variation around the regression line (which is 
the line of average relationship between Y and X) is the same across the X values: it neither increases nor 
decreases as X varies. Diagrammatically, the situation is as depicted in Figure 3.4. 

In contrast, consider Figure 3.5, where the conditional variance of the Y population varies with X. This 
situation is known appropriately as heteroscedasticity, or unequal spread, or variance. Symbolically, in this 
situation, Eq. (3.2.2) can be written as 

var (u;|X;) = 0% C23) 
Notice the subscript on ø? in Eq. (3.2.3), which indicates that the variance of the Y population is no longer 
constant. 

To make the difference between the two situations clear, let Y represent weekly consumption expenditure 
and X weekly income. Figures 3.4 and 3.5 show that as income increases, the average consumption expen- 
diture also increases. But in Figure 3.4 the variance of consumption expenditure remains the same at all levels 
of income, whereas in Figure 3.5 it increases with increase in income. In other words, richer families on the 
average consume more than poorer families, but there is also more variability in the consumption expenditure 
of the former. 

To understand the rationale behind this assumption, refer to Figure 3.5. As this figure shows, var(ulX d< 
var(ulX>), . . . , < var(ulX;). Therefore, the likelihood is that the Y observations coming from the population 
with X = X, would be closer to the PRF than those coming from populations corresponding to X = X,, X = X, 
and so on. In short, not all Y values corresponding to the various X’s will be equally reliable, reliability being 
judged by how closely or distantly the Y values are distributed around their means, that is, the points on the 
PRF. If this is in fact the case, would we not prefer to sample from those Y populations that are closer to their 
mean than those that are widely spread? But doing so might restrict the variation we obtain across X values. 
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f (u) 


Probability density of u; 


Figure 3.4 Homoscedasticity. 


f(u) 


Probability density of u, 


Figure 3.5 Heteroscedasticity. 


By invoking Assumption 4, we are saying that at this stage, all Y values corresponding to the various X’s 
are equally important. In Chapter 11 we shall see what happens if this is not the case, that is, where there is 


heteroscedasticity. 
In passing, note that Assumption 4 implies that the conditional variances of Y, are also homoscedastic. 


That is, 
var (Y;|X;) = 0? l (3.2.4) 


Of course, the unconditional variance of Y is o}. Later we will see the importance of distinguishing 
between conditional and unconditional variances of Y (see Appendix A for details of conditional and 
unconditional variances). 
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Assumption 5 


No Autocorrelation between the Disturbances: Given any two X values, X;and X{i + j), the correlation 
between any two u;and ui +j) is zero. In short, the observations are sampled independently. Symbolically, 


cov(uj, uj| Xj, Xj) = 0 


ake i (3.2.5) 
cov(u; uj) = 0, if X is nonstochastic 


where i and j are two different observations and where cov means covariance. 


In words, Equation 3.2.5 postulates that the disturbances u; and u; are uncorrelated. Technically, this is the 
assumption of no serial correlation, or no autocorrelation. This means that, given X, the deviations of any 
two Y values from their mean value do not exhibit patterns such as those shown in Figures 3.6(a) and (b). In 
Figure 3.6(a), we see that the u’s are positively correlated, a positive u followed by a positive u or a negative 
u followed by a negative u. In Figure 3.6(b), the u’s are negatively correlated, a positive u followed by a 
negative u and vice versa. 

If the disturbances (deviations) follow systematic patterns, such as those shown in Figures 3.6(a) and (b), 
there is auto- or serial correlation, and what Assumption 5 requires is that such correlations be absent. Figure 
3.6(c) shows that there is no systematic pattern to the u’s, thus indicating zero correlation. 


(c) 


Figure 3.6 Patterns of correlation among the disturbances. (a) positive serial correlation; (b) negative serial correlation; 
(¢ zero correlation. 
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The full import of this assumption will be explained thoroughly in Chapter 12. But intuitively one can 
explain this assumption as follows. Suppose in our PRF (Y, = B; + BX, + u,) that u, and u,_, are positively 
correlated. Then Y, depends not only on X, but also on u,_,, for u,_, to some extent determines u,. At this stage 
of the development of the subject matter, by invoking Assumption 5, we are saying that we will consider the 
systematic effect, if any, of X, on Y, and not worry about the other influences that might act on Y as a result of 
the possible intercorrelations among the u’s. But, as noted in Chapter 12, we will see how intercorrelations 
among the disturbances can be brought into the analysis and with what consequences. 

But it should be added here that the justification of this assumption depends on the type of data used in the 
analysis. If the data are cross-sectional and are obtained as a random sample from the relevant population, 
this assumption can often be justified. However, if the data are time series, the assumption of independence is 
difficult to maintain, for successive observations of a time series, such as GDP, are highly correlated. But we 
will deal with this situation when we discuss time series econometrics later in the text. 


Assumption 6 


The Number of Observations n Must Be Greater than the Number of Parameters to Be Estimated: 
Alternatively, the number of observations must be greater than the number of explanatory variables. 


This assumption is not so innocuous as it seems. In the hypothetical example of Table 3.1, imagine that 
we had only the first pair of observations on Y and X (4 and 1). From this single observation there is no way 
to estimate the two unknowns, 8, and B,. We need at least two pairs of observations to estimate the two 
unknowns. In a later chapter we will see the critical importance of this assumption. 


Assumption 7 


The Nature of X Variables: The X values in a given sample must not all be the same. Technically, var (X) 
must be a positive number. Furthermore, there can be no outliers in the values of the X variable, that is, 
values that are very large in relation to the rest of the observations. 


The assumption that there is variability in the X values is also not as innocuous as it looks. Look at 
Eq. (3.1.6). If all the X values are identical, then X; = X (Why?) and the denominator of that equation will 
be zero, making it impossible to estimate B, and therefore 6}. Intuitively, we readily see why this assumption 
is important. Looking at our family consumption expenditure example in Chapter 2, if there is very little 
variation in family income, we will not be able to explain much of the variation in the consumption expen- 
diture. The reader should keep in mind that variation in both Y and X is essential to use regression analysis as 
a research tool. In short, the variables must vary! 

The requirement that there are no outliers in the X values is to avoid the regression results being dominated 
by such outliers. If there are a few X values that are, say, 20 times the average of the X values, the estimated 
regression lines with or without such observations might be vastly different. Very often such outliers are the 
result of human errors of arithmetic or mixing samples from different populations. In Chapter 13 we will 
discuss this topic further. 

Our discussion of the assumptions underlying the classical linear regression model is now complete. It is 
important to note that all of these assumptions pertain to the PRF only and not the SRF. But it is interesting 
to observe that the method of least squares discussed previously has some properties that are similar to the 
assumptions we have made about the PRF. For example, the finding that > a; = 0 and, therefore, ù = 0, is 
akin to the assumption that E(u,lX,) = 0. Likewise, the finding that X a; = 0 is similar to the assumption that 
cov(u;, X;) = 0. It is comforting to note that the method of least squares thus tries to “duplicate” some of the 
assumptions we have imposed on the PRF. 
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Of course, the SRF does not duplicate all the assumptions of the CLRM. As we will show later, although 
cov(u;, u;) = 0 (i # j) by assumption, it is not true that the sample cov(u;, ú;) = 0 (i + j). As a matter of fact, 


we will show later that the residuals are not only autocorrelated but are also heteroscedastic (see Chapter 12). 


A Word about These Assumptions 


The million-dollar question is: How realistic are all these assumptions? The “reality of assumptions” is an 
age-old question in the philosophy of science. Some argue that it does not matter whether the assumptions 
are realistic. What matters are the predictions based on those assumptions. Notable among the “irrelevance- 
of-assumptions thesis” is Milton Friedman. To him, unreality of assumptions is a positive advantage: “to be 
important . . . a hypothesis must be descriptively false in its assumptions.”!? 

One may not subscribe to this viewpoint fully, but recall that in any scientific study we make certain 
assumptions because they facilitate the development of the subject matter in gradual steps, not because they 
are necessarily realistic in the sense that they replicate reality exactly. As one author notes, “. . . if simplicity 
is a desirable criterion of good theory, all good theories idealize and oversimplify outrageously.”!* 

What we plan to do is first study the properties of the CLRM thoroughly, and then in later chapters 
examine in depth what happens if one or more of the assumptions of CLRM are not fulfilled. At the end 
of this chapter, we provide in Table 3.4 a guide to where one can find out what happens to the CLRM if a 
particular assumption is not satisfied. 

As a colleague pointed out to us, when we review research done by others, we need to consider whether 
the assumptions made by the researcher are appropriate to the data and problem. All too often, published 
research is based on implicit assumptions about the problem and data that are likely not correct and that 
produce estimates based on these assumptions. Clearly, the knowledgeable reader should, realizing these 
problems, adopt a skeptical attitude toward the research. The assumptions listed in Table 3.4 therefore provide 
a checklist for guiding our research and for evaluating the research of others. 

With this backdrop, we are now ready to study the CLRM. In particular, we want to find out the statis- 
tical properties of OLS compared with the purely numerical properties discussed earlier. The statistical 
properties of OLS are based on the assumptions of CLRM already discussed and are enshrined in the famous 
Gauss—Markov theorem. But before we turn to this theorem, which provides the theoretical justification 
for the popularity of OLS, we first need to consider the precision or standard errors of the least-squares 
estimates. 


wv 


3.3 Precision or Standard Errors of Least-Squares Estimates 


From Egs. (3.1.6) and (3.1.7), it is evident that least-squares estimates are a function of the sample data. But 
since the data are likely to change from sample to sample, the estimates will change ipso facto. Therefore, 
what is needed is some measure of “reliability” or precision of the estimators Bi and Bo. In statistics the 
precision of an estimate is measured by its standard error (se).'* Given the Gaussian assumptions, it is shown 
in Appendix 3A, Section 3A.3 that the standard errors of the OLS estimates can be obtained as follows: 


13Milton Friedman, Essays in Positive Economics, University of Chicago Press, Chicago, 1953, p. 14. 


Mark Blaug, The Methodology of Economics: Or How Economists Explain, 2d ed., Cambridge University Press, New York, 
Wes (ay, SP 


The standard error is nothing but the standard deviation of the sampling distribution of the estimator, and the 
sampling distribution of an estimator is simply a probability or frequency distribution of the estimator, that is, a distribution 
of the set of values of the estimator obtained from all possible samples of the same size from a given population. Sampling 
distributions are used to draw inferences about the values of the population parameters on the basis of the values of the 
estimators calculated from one or more samples. (For details, see Appendix A.) 
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a o? 
var (82) = yF (3.3.1) 
se (2) = A (3.3.2) 
var (1) = ae, g (3.3.3) 
sof) =| 2 (3.3.4) 


ye x, 


where var = variance and se = standard error and where ø? is the constant or homoscedastic variance of u; of 
Assumption 4. 

All the quantities entering into the preceding equations except a? can be estimated from the data. As 
shown in Appendix 3A, Section 3A.5, g? itself is estimated by the following formula: 


^2 l 
pie = (G35) 


where â? is the OLS estimator of the true but unknown g? and where the expression n — 2 is known as the 
number of degrees of freedom (df), $` ù? being the sum of the residuals squared or the residual sum of 
squares (RSS).'° 

Once J ú? is known, ô? can be easily computed. ` #? itself can be computed either from Eq. (3.1.2) or 
from the following expression (see Section 3.5 for the proof): 


a oy Bet (3.3.6) 


Compared with Eq. (3.1.2), Equation 3.3.6 is easy to use, for it does not require computing #; for each 
observation although such a computation will be useful in its own right (as we shall see in Chapters 11 and 
12): 

Since 


xi 
aA 


Bo 
an alternative expression for computing J` i? is 


2 
y= yop ee) ae 


16The term number of degrees of freedom means the total number of observations in the sample (= n) less the number 
of independent (linear) constraints or restrictions put on them. In other words, it is the number of independent observations 
out of a total of n observations. For example, before the RSS (3.1.2) can be computed, 6; and 2 must first be obtained. 
These two estimates therefore put two restrictions on the RSS. Therefore, there are n — 2, not n, independent observations 
to compute the RSS. Following this logic, in the three-variable regression RSS will have n- 3 df, and for the k-variable model 
it will have n — k df. The general rule is this: df = (n - number of parameters estimated). 
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In passing, note that the positive square root of 67 


5 | ti - (3.3.8) 

6 =, —— - : 
n—2 

is known as the standard error of estimate or the standard error of the regression (se). It is simply the 

standard deviation of the Y values about the estimated regression line and is often used as a summary measure 

of the “goodness of fit” of the estimated regression line, a topic discussed in Section 3.5. 

Earlier we noted that, given X;, 7” represents the (conditional) variance of both u; and Y, Therefore, the 
standard error of the estimate can also be called the (conditional) standard deviation of u; and Y,. Of course, 
as usual, 07 and oy represent, respectively, the unconditional variance and unconditional standard deviation 
of Y. 7 f 

Note the following features of the variances (and therefore the standard errors) of 6; and fp. 

1. The variance of , is directly proportional to a” but inversely proportional to Ds ie. That is, given a’, 
the larger the variation in the X values, the smaller the variance of bo and hence the greater the precision 
with which £, can be estimated. In short, given a”, if there is substantial variation in the X values, B, can 
be measured more accurately than when the X; do not vary substantially. Also, given )~ x?, the larger the 
variance of a”, the larger the variance of B2. Note that as the sample size n increases, the number of terms 
in the sum, J` x?, will increase. As n increases, the precision with which B, can be estimated also increases. 
(Why?) 

2. The variance of f is directly proportional to a° and Y X? but inversely proportional to 5 x? and the 
sample size n. 

3. Since B, and f; are estimators, they will not only vary from sample to sample but in a given sample they 
are likely to be dependent on each other, this dependence being measured by the covariance between them. It 
is shown in Appendix 3A, Section 3A.4 that 


cov (ĝi, Êz) = —X var (B2) 


2 
a ( <3) (3:3:9) 


Since var ( po) is always positive, as is the variance of any variable, the nature of the covariance between Êi 
and $z depends on the sign of X. If X is positive, then as the formula shows, the covariance will be negative. 
Thus, if the slope coefficient B, is overestimated (i.e., the slope is too steep), the intercept coefficient B, will 
be underestimated (i.e., the intercept will be too small). Later on (especially in the chapter on multicollinearity, 
Chapter 10), we will see the utility of studying the covariances between the estimated regression coefficients. 

How do the variances and standard errors of the estimated regression coefficients enable one to judge the 
reliability of these estimates? This is a problem in statistical inference, and it will be pursued in Chapters 4 
and 5. 


3.4 Properties of Least-Squares Estimators: The Gauss-Markov 
Theorem!” 


As noted earlier, given the assumptions of the classical linear regression model, the least-squares estimates 
possess some ideal or optimum properties. These properties are contained in the well-known Gauss—Markov 


“Although known as the Gauss—Markov theorem, the least-squares approach of Gauss antedates (1821) the minimum- 
variance approach of Markov (1900). 
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theorem. me understand this theorem, we need to consider the best linear unbiasedness property of an 
estimator." As explained in Appendix A, an estimator, say the OLS estimator f2, is said to be a best linear 


unbiased estimator (BLUE) of B, if the following hold: 


1. It is linear, that is, a linear function of a random variable, such as the dependent variable Y in the 


regression model. 


2. It is unbiased, that is, its average or expected value, E( po), is equal to the true value, 8. 
3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased estimator with 


the least variance is known as an efficient estimator. 


In the regression context it can be proved that the OLS estimators are BLUE. This is the gist of the famous 


Gauss—Markov theorem, which can be stated as follows: 


Gauss—Markov Theorem 


Given the assumptions of the classical linear regression 
model, the least-squares estimators, in the class of 
unbiased linear estimators, have minimum variance, that 
is, they are BLUE. 


The proof of this theorem is sketched in Appendix 
3A, Section 3A.6. The full import of the Gauss—Markov 
theorem will become clearer as we move along. It is 
sufficient to note here that the theorem has theoretical as 
well as practical importance.'? 

What all this means can be explained with the aid of 
Figure 3.7. í 

In Figure 3.7(a) we have shown the sampling 
distribution of the OLS estimator A), that is, the 
distribution of the values taken by f in repeated sampling 
experiments (recall Table 3.1). For convenience we have 
assumed po to be distributed symmetrically (but more on 
this in Chapter 4). As the figure shows, the mean of the 
bo values, E( 2), is equal to the true B,. In this situation 
we say that b> is an unbiased estimator of B,. In Figure 
3.7(b) we have shown the sampling distribution of 63, 
an alternative estimator of 8, obtained by using another 
(i.e., other than OLS) method. For convenience, assume 
that 6}, like pp is unbiased, that is, its average or expected 
value is equal to 8. Assume further that both Ê, and 8% are 
linear estimators, that is, they are linear functions of Y. 
Which estimator, 8, or 6%, would you choose? 


A p2 
E(B2) = Bz 
(a) Sampling distribution of B, 
7 p 
E(B) = b: í 
(b) Sampling distribution of 83 
wg 
D `~ a cat 
Bz B2 


(c) Sampling distributions of $, and p3 


Figure 3.7 Sampling distribution of OLS estimator 
Êz and alternative estimator B2. 


18The reader should refer to Appendix A for the importance of linear estimators as well as for a general discussion of the 


desirable properties of statistical estimators. 


19For example, it can be proved that any linear combination of the p's, such as (8, — 282), can be estimated by (A; — 262), 
and this estimator is BLUE. For details, see Henri Theil, Introduction to Econometrics, Prentice-Hall, Englewood Cliffs, N.J., 
1978, pp. 401-402. Note a technical point about the Gauss—Markov theorem: It provides only the sufficient (but not 
necessary) condition for OLS to be efficient. | am indebted to Michael McAleer of the University of Western Australia for 


bringing this point to my attention. 
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To answer this question, superimpose the two figures, as in Figure 3.7(c). It is obvious that although 
both > and £3 are unbiased the distribution of £% is more diffused or widespread around the mean value 
than the distribution of po. In other words, the variance of £% is larger than the variance of 62. Now given 
two estimators that are both linear and unbiased, one would choose the estimator with the smaller variance 
because it is more likely to be close to B, than the alternative estimator. In short, one would choose the BLUE 
estimator. 

The Gauss—Markov theorem is remarkable in that it makes no assumptions about the probability distri- 
bution of the random variable u;, and therefore of Y, (in the next chapter we will take this up). As long as 
the assumptions of CLRM are satisfied, the theorem holds. As a result, we need not look for another linear 
unbiased estimator, for we will not find such an estimator whose variance is smaller than the OLS estimator. 
Of course, if one or more of these assumptions do not hold, the theorem is invalid. For example, if we 
consider nonlinear in-the-parameter regression models (which are discussed in Chapter 14), we may be able 
to obtain estimators that may perform better than the OLS estimators. Also, as we will show in the chapter on 
heteroscedasticity, if the assumption of homoscedastic variance is not fulfilled, the OLS estimators, although 
unbiased and consistent, are no longer minimum variance estimators even in the class of linear estimators. 

The statistical properties that we have just discussed are known as finite sample properties: These 
properties hold regardless of the sample size on which the estimators are based. Later we will have occasions 
to consider the asymptotic properties, that is, properties that hold only if the sample size is very large 
(technically, infinite). A general discussion of finite-sample and large-sample properties of estimators is given 
in Appendix A. 


3.5 The Coefficient of Determination r2: A Measure of “Goodness of Fit” 


Thus far we were concerned with the problem of estimating regression coefficients, their standard errors, and 
some of their properties. We now consider the goodness of fit of the fitted regression line to a set of data; 
that is, we shall find out how “well” the sample regression line fits the data. From Figure 3.1 it is clear that 
if all the observations were to lie on the regression line, we would obtain a “perfect” fit, but this is rarely the 
case. Generally, there will be some positive &; and some negative “;. What we hope for is that these residuals 
around the regression line are as small as possible. The coefficient of determination 7? (two-variable case) 
or R (multiple regression) is a summary measure that tells how well the sample regression line fits the data. 

Before we show how r° is computed, let us consider a heuristic explanation of r° in terms of a graphical 
device, known as the Venn diagram, or the Ballentine, as shown in Figure a. geo 

In this figure the circle Y represents variation in the dependent variable Y and the circle X represents 
variation in the explanatory variable X.*! The overlap of the two circles (the shaded area) indicates the extent 
to which the variation in Y is explained by the variation in X (say, via an OLS regression). The greater the 
extent of the overlap, the greater the variation in Y is explained by X. The r° is simply a numerical measure of 
this overlap. In the figure, as we move from left to right, the area of the overlap increases, that is, successively 
a greater proportion of the variation in Y is explained by X. In short, r? increases. When there is no overlap, 7° 
is obviously zero, but when the overlap is complete, r? is 1, since 100 percent of the variation in Y is explained 
by X. As we shall show shortly, r? lies between 0 and 1. 


20See Peter Kennedy, “Ballentine: A Graphical Aid for Econometrics,” Australian Economics Papers, vol. 20, 1981, 
pp. 414-416. The name Ballentine is derived from the emblem of the well-known Ballantine beer with its circles. 

?1The term variation and variance are different. Variation means the sum of squares of the deviations of a variable from its 
mean value. Variance is this sum of squares divided by the appropriate degrees of freedom. In short, variance = variation/df. 
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(a) (b) (c) 


(d) (e) (f) 
Figure 3.8 The Ballentine view of 7: (a) ° =0;(f) P= 1. 


To compute this r°, we proceed as follows: Recall that 


Y =f +û; (2.6.3) 
or in the deviation form 
Yi = fi ti; (3.5.1) 


where use is made of Eqs. (3.1.13) and (3.1.14). Squaring Equation 3.5.1 on both sides and summing over 
the sample, we obtain 


Lr- jâ, 
= oe oe, (3.5.2) 


since 2 Pit; = (why?) and Îi = Box;. 

The various sums of squares appearing in Equation 3.5.2 can be described as follows: $} y? = } (Y; — Y} 
total variation of the actual Y values about their sample mean, which may be called the total sum of squares 
(TSS). $ $? = (È; — Y)? =, — F}? = A} Y x? variation of the estimated Y values about their mean 
( Y= Y), which appropriately may be called the sum of squares due to regression [i.e., due to the explanatory 
variable(s)], or explained by regression, or simply the explained sum of squares (ESS). $` 2? residual or 
unexplained variation of the Y values about the regression line, or simply the residual sum of squares 
(RSS). Thus, Eq. (3.5.2) is 


TSS = ESS + RSS (3:5.3) 


and shows that the total variation in the observed Y values about their mean value can be partitioned into 
two parts, one attributable to the regression line and the other to random forces because not all actual Y 
observations lie on the fitted line. Geometrically, we have Figure 3.9. 
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= due to residual 


(¥,-Y) = total 


> (Y;-Y) = due to regression 


X 
0 X; 
Figure 3.9 Breakdown of the variation of Y; into two components. 
Now dividing Equation 3.5.3 by TSS on both sides, we obtain 
_ ESS a RSS 
“TSS. TSS 
= Ce Oe See a 
E oe ~ 
We now define 7? as 
2 
r = 2O e (3.5.5) 
Sr- Y)2 TSS 
or, alternatively, as 
a 
AO am 
RSSe= . 
=a TSS (3.5.5a) 


The quantity r? thus defined is known as the (sample) coefficient of determination and is the most 
commonly used measure of the goodness of fit of a regression line. Verbally. r° measures the proportion or 
percentage of the oe variation in Y explained by the regression model. 

Two properties of r? may be noted: 

1. It is a nonnegative SUANISY: (Why?) 


Quits limits are 0 <r? < 1. An r° of 1 means a perfect fit, that is, y; = = Y, for each 7. On the other 
hand, an r° of zero means that there is no relationship between the regressand and the regressor whatsoever 
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(i.e. Pz = 0). In this case, as Eq. (3.1.9) shows, Ý, = Êi = Y, that is, the best prediction of any Y value is 
simply its mean value. In this situation therefore the regression line will be horizontal to the X axis. 
Although r“ can be computed directly from its definition given in Equation 3.5.5, it can be obtained more 


quickly from the following formula: 
ee 
TSS 
Toa a 
B 
Beni 
Ly 


zt) 


If we divide the numerator and the denominator of Equation 3.5.6 by the sample size n (or n — 1 if the sample 
size is small), we obtain 


(3.5.6) 


2 2 Sh 
i = (=) (3.5.7) 


k 


where $? and SŽ are the sample variances of Y and X, respectively. 
Since p> = D> xi; / >>x?, Eq. (3.5.6) can also be expressed as 


2 
2_ (Zxr) 3.5.8 
Se To i 


an expression that may be computationally easy to obtain. 
Given the definition of r“, we can express ESS and RSS discussed earlier as follows: 


ESS = r? - TSS 
_2 = z (3.5.9) 
RSS = TSS — ESS 
= TSS(1 — ESS/TSS) (3.5.10) 


=% y (-r°) 


TSS = ESS + RSS 


werd t+d-P) ov (3.5.11) 


an expnassion that we will find very useful later. 

A quantity closely related to but conceptually very much different from r° is the coefficient of 
correlation, which, as noted in Chapter 1, is a measure of the degree of association between two variables. 
It can-be computed either from 


Therefore, we can write 


rat? non (3.5.12) 
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or from its definition 
xii 
(Sox?) 97) 
nye Xi —- (XI) Ni) 
[xP (LHX) Ie DH - (LH) 


t.2 


r= 


(3.5.13) 


which is known as the sample correlation coefficien 

Some of the properties of r are as follows (see Figure 3.10): 

1. It can be positive or negative, the sign depending on the sign of the term in the numerator of Equation 
3.5.13, which measures the sample covariation of two variables. 

2. It lies between the limits of —1 and +1; that is, -1 = r £ 1. 

3. It is symmetrical in nature; that is, the coefficient of correlation between X and Y(ryy) is the same as that 
between Y and X(ryy). 

4. It is independent of the origin and scale; that is, if we define X} = aX; + C and Y;** = bY; + d, where 
a > 0, b > 0, and c and d are constants, then r between X * and Y“ is the same as that between the original 
variables X and Y. 

5. If X and Y are statistically independent (see Appendix A for the definition), the correlation coefficient 
between them is zero; but if r = 0, it does not mean that two variables are independent. In other words, zero 
correlation does not necessarily imply independence. [See Figure 3.10(/).] 

6. It is a measure of linear association or linear dependence only; it has no meaning for describing 
nonlinear relations. Thus in Figure 3.10(h), Y = X? is an exact relationship yet r is zero. (Why?) 

7. Although it is a measure of linear association between two variables, it does not necessarily imply any 
cause-and-effect relationship, as noted in Chapter 1. 

In the regression context, r” is a more meaningful measure than r, for the former tells us the proportion of 
variation in the dependent variable explained by the explanatory variable(s) and therefore provides an overall 
measure of the extent to which the variation in one variable determines the variation in the other. The latter 
does not have such value.” Moreover, as we shall see, the interpretation of r (= R) ina multiple regression 
model is of dubious value. However, we will have more to say about r° in Chapter 7. 

In passing, note that the r* defined previously can also be computed as the squared coefficient of corre- 
lation between actual Y , and the estimated Y, namely, Y;. That is, using Eq. (3.5.13), we can write 


2. e 
a = ey ae)” 
That is, 
ANZ 
2_ a) 
EDR) 
where Y, = actual Y, ; estimated Y, and Y = Ŷ = the mean of Y. For proof, see Exercise 3.15. Expression 


3.5.14 justifies the description of r% as a measure of goodness of fit, for it tells how close the estimated Y 
values are to their actual values. 


(3.5.14) 


22The population correlation coefficient, denoted by p, is defined in Appendix A. 


Bin regression modeling the underlying theory will indicate the direction of causality between Y and X, which, in the 
context of single-equation models, is generally from X to Y. 
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r=+] ; 7 =| r close to +1 
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y } a ie 
r positive but r negative but 
r close to -1 close to zero close to zero 


(d) (e) (f) 


(g) (h) 


Figure 3.10 Correlation patterns (adapted from Henri Theil, Introduction to F:conometrics, Prentice-Hall, Englewood Cliffs, 
NJ, 1978, p. 86). 


3.6 A Numerical Example 


We illustrate the econometric theory developed so far by considering the data given in Table 2.6, which 
relates mean hourly wage (Y) and years of schooling (X). Basic labor economics theory tells us, that among 
many variables, education is an important determinant of wages. 
In Table 3.2 we provide the necessary raw data to estimate the quantitative impact of education on wages. 
From the data given in this table, we obtain the estimated regression line as follows: 


f; = —0.0144 + 0.7240X, (3.6.1) 


Geometrically, the estimated regression line is as shown in Figure 3.11. 
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Table 3.2 Raw Data Based on Table 2.6 


Obs _ if X x y xj yixi 
1 4.4567 6 —6 —4.218 36 25.308 
2 5.77 7 —5 —2.9047 25 14.5235 
3 5.9787 8 -4 —2.696 16 10.784 
4 7.3317 9 -3 —1.343 9 4.029 
5 7.3182 10 —2 —1.3565 4 2S 
6 6.5844 11 —1 —2.0903 1 2.0903 
7 7.8182 12 0 —0.8565 0 0 
8 7.8351 13 1 —0.8396 1 —0.8396 
9 11.0223 14 2 2.3476 4 4.6952 
10 10.6738 15 3 1.9991 9 5.9973 
11 10.8361 16 4 2.1614 16 8.6456 
12 13.615 17 5 4.9403 25 24.7015 
13 13.531 18 6 4.8563 36 29.1378 
Sum 112.7712 156 0 0 182 131.7856 
Obs xe Y? Ý, 
1 36 19.86217 4.165294 0.291406 0.084917 
2 49 33.2929 4.916863 0.853137 0.727843 
3 64 35.74485 5.668432 0.310268 0.096266 
4 81 53.75382 6.420001 0.911699 0.831195 
Si 100 53.55605 TAMA Sz 0.14663 0.0215 
6 121 43.35432 7.923139 —1.33874 1.792222 
7 144 61.12425 8.674708 —0.85651 0.733606 
8 169 61.38879 9.426277 —1.59118 2.531844 
9 196 121.4911 10.17785 0.844454 0.713103 
10 225 113.93 10.92941 —0.25562 0.065339 
11 256 117.4211 11.68098 —0.84488 0.713829 
12 289 185.3682 12.43255 1.182447 1.398181 
13 324 183.088. 13.18412 0.346878 0.120324 
Sum 2054 1083.376 112.7712 =0 9.83017 
Note: 
x =X; -Xy =Y; =¥ 
Pian ERS 7240967 


2x 182.0 


Êi = F — BX = 8.674708 — 0.7240967x12 = —0.01445 


ê? = 


var(ĝ2) = 


2 


PF =l 


r= 


var(B1) = 


La? 9.83017 


i=?) ll 
ê? _ 0.893652 
Ex? 1820 
Ea 9.83017 
sOy © 105.1188 
Jr? = 0.9521 
ie 2054 
a = 0.868132: 


ndx?  13(182) 


= 0.893652; 6 = 0.945332 


= 0.004910; se(B2) = /0.00490 = 0.070072 


se(B;) = /0.868132 = 0.9317359 


= 0.9065 
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Mean hourly wage 


4 6 8 10 12 14 16 18 20 
Education 


Figure 3.11 Estimated regression line for wage-education data from Table 2.6. 


As we know, each point on the regression line gives an estimate of the mean value of Y corresponding to 
the chosen X value, that is, Ê is an estimate of E(Y1X;). The value of b = 0.7240, which measures the slope 
of the line, shows that, within the sample range of X between 6 and 18 years of education, as X increases by 
1, the estimated increase in mean hourly wages is about 72 cents. That is, each additional year of schooling, 
on average, increases hourly wages by about 72 cents. 

The value of Â; = —0.0144, which is the intercept of the line, indicates the average level of wages when 
the level of education is zero. Such literal interpretation of the intercept in the present case does not make any 
sense. How could there be negative wages? As we will see throughout this book, very often the intercept term 
has no viable practical meaning. Besides, zero level of education is not in the observed level of education in 
our sample. As we will see in Chapter 5, the observed value of the intercept is not statistically different from 
Zero. 

The r? value of about 0.90 suggests that education explains about 90 percent of the variation in hourly 
wage. Considering that r* can be at most 1, our regression line fits the data very well. The coefficient of corre- 
lation, r = 0.9521, shows that wages and education are highly positively correlated. 

Before we leave our example, note that our model is extremely simple. Labor economics theory tells us 
that, besides education, variables such as gender, race, location, labor unions, and language are also important 
factors in the determination of hourly wages. After we study multiple regression in Chapters 7 and 8, we will 
consider a more extended model of wage determination. 
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3.7 Illustrative Examples 


Example 3.1 Consumption-Income Relationship in India, 1950-51 to 2006-07 


Let us revisit the consumption income data given in Table |.1 of the Introduction. We have already shown the 
data in Figure |.3, along with the estimated regression line in Eq. (1.3.3). Now we provide the underlying OLS 
regression results, which were obtained from estimation. Note Y = Private Final Consumption Expenditure 
(PFCE) and X = Gross Domestic Product (GDP), both measured in Rupee crore at 1999-2000 prices. In this 
example, the data are time series data. 


A 


Ŷ, = 103736.0493 + 0.6303 X, (3.7.1) 
var (Bi) = 43393430.86 se(f1) = 6587.37 
var (82) = 0.000036 se(f2) = 0.01 

r? = 0.9950 ò? = 911106030.36 


Equation (3.7.1) is the aggregate, or economywide, Keynesian consumption function. As this equation 
shows, the marginal propensity to consume (MPC) is about 0.63, suggesting that if (real income) goes 
up by a rupee, the average private final consumption expenditure goes up by about 63 paisa. According to 
Keynesian theory, MPC is expected to lie between 0 and 1. 

The intercept value in this example is positive indicating that if the value of GDP were zero, the average 
level of private final consumption expenditure would be about 104 thousand crore rupees. 

The r? value of 0.9950 means approximately 99 percent of the variation in private final consumption 
expenditure is explained by variation in the GDP. This value is quite high, considering that r can at most be 
1. As we will see throughout this book, in regressions involving time series data one generally obtains high 7 
values. We will explore the reasons behind this in the chapter on autocorrelation and also in the chapter on 
time series econometrics. 


Example 3.2 Food Expenditure in India 


Refer to the data given in Table 2.8 of Exercise 2.15. The data relate to a sample of 55 rural households in 
India. The regressand in this example is expenditure on food and the regressor is total expenditure, a proxy 
for income, both figures in rupees. The data in this example are thus cross-sectional data. 


On the basis of the given data, we obtained the following regression: ~ 
FoodExp; = 94.2087 + 0.4368 TotalExp; (3.7.2) 
var (1) = 2560.9401 se(ĝ1ı) = 50.8563 
var (Ê2) = 0.0061 se (82) = 0.0783 
r? = 0.3698 a? = 4469.6913 


From Equation 3.7.2 we see that if total expenditure increases by 1 rupee, on average, expenditure on food 
goes up by about 44 paise (1 rupee = 100 paise). If total expenditure were zero, the average expenditure 
on food would be about 94 rupees. Again, such a mechanical interpretation of the intercept may not be 
meaningful. However, in this example one could argue that even if total expenditure is zero (e.g., because of 
loss of a job), people may still maintain some minimum level of food expenditure by borrowing money or by 
dissaving. 

The 7? value of about 0.37 means that only 37 percent of the variation in food expenditure is explained by 
the total expenditure. This might seem a rather low value, but as we will see throughout this text, in cross- 
sectional data, typically one obtains low r? values, possibly because of the diversity of the units in the sample. 


We will discuss this topic further in the chapter on heteroscedasticity (see Chapter 11). 
Eee 
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Example 3.3 Demand for Cellular Phones and Personal Computers in Relation to 
Per Capita Personal Income 


Table 3.3 gives data on the number of cell phone subscribers and the number of personal computers (PCs), 
both per 100 persons, and the purchasing-power adjusted per capita income in dollars for a sample of 34 
countries. Thus we have cross-sectional data. These data are for the year 2003 and are obtained from the 
Statistical Abstract of the United States, 2006. 

Although cell phones and personal computers are used extensively in the United States, that is not the 
case in many countries. To see if per capita income is a factor in the use of cell phones and PCs, we regressed 
each of these means of communication on per capita income using the sample of 34 countries. The results 
are as follows: 


Table 3.3 Number of Cellular Phone Subscribers per Hundred Persons and Number of 


Personal Computers per 100 Persons and Per Capita Income in Selected Countries for 2003 


Country 


Cellphone PCs Per Capita Income ($) 
Argentina 17.76 8.2 11410 
Australia 71.95 60.18 28780 
Belgium 79.28 31.81 28920 
Brazil 26.36 7.48 7510 
Bulgaria 46.64 5.19 75.4 
Canada 41.9 48.7 30040 
China 21.48 276 4980 
Colombia 14.13 4.93 6410 
Czech Republic 96.46 17.74 15600 
Ecuador 18.92 3.24 3940 
Egypt 8.45 2.91 3940 
France 69.59 ” 34.71 27640 
Germany 78.52 48.47 27610 
Greece 90.23 8.17 19900 
Guatemala 13.15 1.44 4090 
Hungary 76.88 10.84 13840 
India 2.47 0.72 2880 
Indonesia 8.74 1.19 3210 
Italy 101.76 23.07 26,830 
Japan. 67.9 38.22 28450 
Mexico 29.47 . 8.3 8980 
Netherlands 76.76 46.66 28560 
Pakistan 1.75 0.42 2040 
Poland 45.09 14.2 11210 
Russia 24.93 8.87 8950 
Saudia Arabia 32.11 13.67 13230 
South Africa 36.36 7.26 10130 
Spain 91.61 m 19.6 22150 
Sweden 98.05 62.13 26710 
Switzerland 84.34 70.87 32220 
Thailand 39.42 3.98 7450 
U.K. 91.17 40.57 27690 
U.S. 54.58 65.98 ` 37750 
Venezuela 273 6.09 4750 


Note: The data on cell phones and personal computers are per 100 persons. 
Source: Statistical Abstract of the United States, 2006, Table 1364 for data on cell phones and computers and Table 1327 
for purchasing-power adjusted per capita income. 
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Demand for Cell Phones. Letting Y = number of cell phone subscribers and X = purchasing-power-adjusted 
per capita income, we obtained the following regression. 


f, = 14.4773 + 0.0022X; . (3.7.3) 
se(fi) = 6.1523; se(f2) = 0.00032 
r2 = 0.6023 


The slopeacoefficient suggests that if per capita income goes up by, say, $1,000, on average, the number of 
cell phone subscribers goes up by about 2.2 per 100 persons. The intercept value of about 14.47 suggests 
that even if the per capita income is zero, the average number of cell phone subscribers is about 14 per 100 
subscribers. Again, this interpretation may not have much meaning, for in our sample we do not have any 
country with zero per capita income. The r value is moderately high. But notice that our sample includes a 
variety of countries with varying levels of income. In such a diverse sample we would not expect a very high 
r value. 

After we study Chapter 5, we will show how the estimated standard errors reported in Equation 3.7.3 can 
be used to assess the statistical significance of the estimated coefficients. 


Demand for Personal Computers. Although the prices of personal computers have come down substan- 
tially over the years, PCs are still not ubiquitous. An important determinant of the demand for personal 
computers is personal income. Another determinant is price, but we do not have comparative data on PC 
prices for the countries in our sample. 

Letting Y denote the number of PCs and X the per capita income, we have the following “partial” demand 
for the PCs (partial because we do not have comparative price data or data on other variables that might affect 
the demand for the PCs). 


_¥ = —6.5833 + 0.0018X; (3.7.4) 
se (pi) = 2.7437;  se(ĝ2) = 0.00014 
r2= 0.8290 


As these results suggest, per capita personal income has a positive relationship to the demand for PCs. After 
we study Chapter 5, you will see that, statistically, per capita personal income is an important determinant of 
the demand for PCs. The negative value of the intercept in the present instance has no practical significance. 
Despite the diversity of our sample, the estimated r° value is quite high. The interpretation of the slope coeffi- 
cient is that if per capita income increases by, say, $1,000, on average, the demand for personal computers 
goes up by about 2 units per 100 persons. “ 

Even though the use of personal computers is spreading quickly, there are many countries which still use 
main-frame computers. Therefore, the total usage of computers in those countries may be much higher than 
that indicated by the sale of PCs. 


3.8 A Note on Monte Carlo Experiments 


In this chapter we showed that under the assumptions of CLRM the least-squares estimators have certain 
desirable statistical features summarized in the BLUE property. In the appendix to this chapter we prove this 
property more formally. But in practice how does one know that the BLUE property holds? For example, how 
does one find out if the OLS estimators are unbiased? The answer is provided by the so-called Monte Carlo 
experiments, which are essentially computer simulation, or sampling experiments. 

To introduce the basic ideas, consider our two-variable PRF: 


Y; = Bi + BX; + ui (3.8.1) 
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A Monte Carlo experiment proceeds as follows: 

1. Suppose the true values of the parameters are as follows: B, = 20 and B, = 0.6. 

2. You choose the sample size, say n = 25. 

3. You fix the values of X for each observation. In all you will have 25 X values. 

4. Suppose you go to a random number table, choose 25 values, and call them u; (these days most 
statistical packages have built-in random number generators).”* 

5. Since you know B,, B>, X; and u, using Equation 3.8.1 you obtain 25 Y, values. 

6. Now using the 25 Y, values thus generated, you regress these on the 25 X values chosen in step 3, 
obtaining Â; and fy, the least-squares estimators. 

7. Suppose you repeat this experiment 99 times, each time using the same B,, 85, and X values. Of course, 
the u; values will vary from experiment to experiment. Therefore, in all you have 100 experiments, thus 
generating 100 values each of 8, and B,. (In practice, many such experiments are conducted, sometimes 
1000 to 2000.) z A 

8. You take the averages of these 100 estimates and call them ĝ, and oe 

9. If these average values are about the same as the true values of B, and B, assumed in step 1, this 
Monte Carlo experiment “establishes” that the least-squares estimators are indeed unbiased. Recall that under 
CLRM £(f;) = A; and E(f2) = Bp. 

These steps characterize the general nature of the Monte Carlo experiments. Such experiments are often 
used to study the statistical properties of various methods of estimating population parameters. They are 
particularly useful to study the behavior of estimators in small, or finite, samples. These experiments are also 
an excellent means of driving home the concept of repeated sampling that is the basis of most of classical 
statistical inference, as we shall see in Chapter 5. We shal! provide several examples of Monte Carlo experi- 
ments by way of exercises for classroom assignment. (See Exercise 3.27.) 


Summary and Conclusions 


The important topics and concepts developed in this chapter can be summarized as follows. 

1. The basic framework of regression analysis is the CLRM. 

2. The CLRM is based on a set of assumptions. 

3. Based on these assumptions, the least-squares estimators take on certain properties summarized in the 
Gauss—Markov theorem, which states that in the class of linear unbiased estimators, the least-squares 
estimators have minimum variance. In short, they are BLUE. 

4. The precision of OLS estimators is measured by their standard errors. In Chapters 4 and 5 we shall 
see how the standard errors enable one to draw inferences on the population parameters, the 6 coeffi- 
cients. 

5. The overall goodness of fit of the regression model is measured by the coefficient of determination, 
r°. It tells what proportion of the variation in the dependent variable, or regressand, is explained by the 
explanatory variable, or regressor. This r” lies between 0 and 1; the closer it is to 1, the better is the fit. 

6. A concept related to the coefficient of determination is the coefficient of correlation, r. It is a measure 
of linear association between two variables and it lies between —1 and +1. 


24ih practice it is assumed that u; follows a certain probability distribution, say, normal, with certain parameters (e.g., 
the mean and variance). Once the values of the parameters are specified, one can easily generate the u; using statistical 
packages. 
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7. The CLRM is a theoretical construct or abstraction because it is based on a set of assumptions that may 
be stringent or “unrealistic.” But such abstraction is often necessary in the initial stages of studying any 
field of knowledge. Once the CLRM is mastered, one can find out what happens if one or more of its 
assumptions are not satisfied. The first part of this book is devoted to studying the CLRM. The other 
parts of the book consider the refinements of the CLRM. Table 3.4 gives the road map ahead. 


Table 3.4 What Happens If the Assumptions of CLRM Are Violated? 


Assumption 
Number Type of Violation Where to Study? 
1 Nonlinearity in parameters Chapter 14 
2 Stochastic regressor(s) Chapter 13 
3 Nonzero mean of u; Introduction to Part Il 
4 Heteroscedasticity Chapter 11 
5 Autocorrelated disturbances Chapter 12 
6 Sample observations less Chapter 10 
than the number of regressors 
7 Insufficient variability in regressors Chapter 10 
8 Multicollinearity* Chapter 10 
9 ` Specification bias* Chapters 13, 14 
Wo Nonnormality of disturbances ‘Chapter 13 


*These assumptions will be introduced in Chapter 7, when we discuss the multiple regression model. 
**Note: The assumption that the disturbances u; are normally distributed is not a part of the CLRM. But more on this in Chapter 4. 


Multiple Choice Questions 


Choose the best alternative for each question 
1. The population regression function is not directly observable. This is a 
a. True statement 
b. False statement 
c. Mostly true statement depending on the population 
d. Mostly false statement depending on the observation capacity of researcher %4 
2 DUA = = ĝi + BX + û;, ù; gives the differences between 
a. The actual and estimated Y values 
b. The actual and estimated X values 
c. The actual and estimated beta values 
d. The actual and estimated u values 
3. Under the least square procedure, larger the ù; (in absolute terms), the larger the 
a. standard error 
b. Regression error 
c. Squared sum of residuals 
d. Difference between true parameter and estimated parameter 
4. mi method of least squares provide with unique estimates of 6, and $, that give the smallest possible 
value of 


A 


aru; 
b. i 
oe DYE 
d. È ù; 


10. 


HE 


12 


13. 
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. By solving these simultaneous equations we obtain the least squares estimators 


a. Non normal equations 
b. Normal equations 

c. Linear equations 

d. Non linear equations 


. The least square estimators are 


a. Period estimators 

b. Point estimators 

c. Population estimators 
d. Popular estimators 


. The mean value of the estimated Y È ) is 


a. Equal to the mean value of actual Y (F). 
b. Not equal to mean value of actual Y (Y). 
c. Equal to the mean value of actual X (X). 
d. Not equal to the mean value of actual X (X). 


. 5f; û, and Di, X, are 


a. Positive values 
b. Negative values 
c. Equal to zero 

d. Any of the above 


. The mean value of u; conditional upon the given X; is 


a. Positive values 
b. Negative values 
c. Equal to zero 
d. Any of the above 
In classical linear regression model, X; and u; are 
a. Positively correlated 
b. Negatively correlated 
c. Highly correlated 
d. Not correlated 
Homoscedasticity refers to the error terms having 
a. Zero mean 
b. Positive variance 
c. Constant variance 
d. Positive mean 
One of the assumptions of CLRM is that the values of the explanatory variable X must 
a. ` All be positive 
b. Not all be the same 
c. All be negative 
d. Average to zero 
In statistics standard error measures the 
a. Precision of an estimate 
b. Correlation between Y and X 
c. Specification error of the model 
d. Autocorrelation in the regression model 
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14. One of these is not a part of classical assumptions 


Is: 


16. 


Le 


18. 


19. 


20. 


21. 


20 


a. Values taken by regressand Y is fixed in repeated sampling 
b. Regression model is linear in parameters 
c. Error term has mean zero 
d. Error term has a constant variance 
In a two variable linear regression model the slope coefficient measures 
a. The mean value of Y 
b. The change in Y which the model predicts for a unit change in X 
c. The change in X which the model predicts for a unit change in Y 
d. The value of Y for any given value of X 
The fitted regression equation is given by 3 =-—12 + 0.5 X. What is the value of the residual at the point 
X= 50, Y=70? 
a. 57 
b. -57 
c. 0 
d. 33 
What is the number of degrees of freedom for a simple bivariate linear regression with 100 observa- 
tions? 
a. 100 
b. 97 
c. 98 
d. 2 
Given the assumption of the CLRM, the least squares estimates possess some optimum properties — 
given by Gauss—Markov theorem. Which of these statements is NOT part of the theorem 
a. The estimator ĝ, is a linear function of a random variable 
b. The average value of the estimator > is equal to zero 
c. The estimator 8, has minimum variance 
d. The estimator f, is unbiased estimator 
Coefficient of determination measures 
a. The correlation between X and Y 
b. The residual sum of squares as a proportion of the Total sum of squares 
c. The explained sum of squares as a proportion of the Total sum of squares v 
d. How well the sample regression fits the data 
Coefficient of correlation 
a. Lies between —1 and +1 
b. Is always equal to zero 
c. Is a measure of nonlinear dependence of two variables 
d. Implies causation in a relationship 
For ae of determination 7° for a regression model 
a. r =Y 


a 


b. 
C. 
d. 
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23. Zero correlation does not necessarily imply independence between the two variables. The statement is 


a. False 

b. True 

c. Depends on mean value of X and Y 
d. Depends on r 


24. The r? measures the percentage of the total variation in 


a. X explained by Y 

b. Y explained by betas 

c. Y explained by u; 

d. Y explained by the regression model 


25. When Y, = Y, for each i in a regression model then the value of r? would be 
am =Y 
b. OEP | 
c. r=l 
d. r=0 
Exercises 
Questions 

3.1. Given the assumptions in column | of the table, show that the assumptions in column 2 are equivalent 

to them. 

Assumptions of the Classical Model 

(1) (2) 

F(u;| X) = 0 E(Y; | X) = B2 + B2X 
cov (u, u)=O01#j cov(Y¥, Y)=0i#j 
var (uj| X) = o? var (Y; |X) = o? 

3.2. Show that the estimates ĝi =wleS72:end bo = 1.357 used in the first experiment of Table 3.1 are in fact 
the OLS estimators. 

3.3. According to Malinvaud (see footnote 11), the assumption that E(u,lX;) = 0 is quite important. To see 
this, consider the PRF: Y = B, + BX; + u; Now consider two situations: (i) B, = 0, B, = 1, and E(u;) = 0; 
and (ii) 8; = 1, B, = 0, and E(u;) = (X, — 1). Now take the expectation of the PRF conditional upon X in 
the two preceding cases and see if you agree with Malinvaud about the significance of the assumption 
E(uJX;) = 0. 

3.4. Consider the sample regression 


Y; = By + BX; + ai 


Imposing the restrictions (i) ` ù; = 0 and (ii) }° a; X; = 0, obtain the estimators ĝi and p> and show 
that they are identical with the least-squares estimators given in Eqs. (3.1.6) and (3.1.7). This method 
of obtaining estimators is called the analogy principle. Give an intuitive justification for imposing 
restrictions (i) and (ii). (Hint: Recall the CLRM assumptions about u;) In passing, note that the analogy 
principle of estimating unknown parameters is also known as the method of moments in which sample 
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3-5 


3.6. 


SLIK 


3.8; 


3:9: 


3.10; 


31E 


31127 


moments (e.g., sample mean) are used to estimate population moments (e.g., the population mean). 
As noted in Appendix A, a moment is a summary statistic of a probability distribution, such as the 
expected value and variance. 

Show that r? defined in (3.5.5) ranges between 0 and 1. You may use the Cauchy—Schwarz inequality, 
which states that for any random variables X and Y the following relationship holds true: 


[E(XY)P < E(X’)E(Y’) 


Let B yx and B xy represent the slopes in the regression of Y on X and X on Y, respectively. Show that 


By xBxy ar 
where r is the coefficient of correlation between X and Y. 
Suppose in Exercise 3.6 that by xB xy = 1. Does it matter then if we regress Y on X or X on Y? Explain 
carefully. 
Spearman’s rank correlation coefficient r; is defined as follows: 


6y ad’ 
n(n? — 1) 
where d = difference in the ranks assigned to the same individual or phenomenon and n = number of 
individuals or phenomena ranked. Derive r, from r defined in Eq. (3.5.13). Hint: Rank the X and Y 
values from 1 to n. Note that the sum of X and Y ranks is n(n + 1)/2 each and therefore their means are 


(n + 1)/2. 
Consider the following formulations of the two-variable PRF: 


Model I: Y; = Bi + 2X; + üi 


Model Il: Y; =œ; +%(X; — X) + ú; 
a. Find the estimators of 8, and a,. Are they identical? Are their variances identical? 
b. Find the estimators of B, and a. Are they identical? Are their variances identical? 
c. What is the advantage, if any, of model II over model I? 
Suppose you run the following regression: 


i 


yi = By + Box; Fü; 

where, as usual, y; and x; are deviations from their respective mean values. What will be the value of 
B 1? Why? Will bo be the same as that obtained from Eq. (3.1.6)? Why? 
Let r, = coefficient of correlation between n pairs of values (Y;, X;) and r, = coefficient of correlation 
between n pairs of values (aX; + b, cY; + d), where a, b, c, and d are constants. Show that r} = r, and 
hence establish the principle that the coefficient of correlation is invariant with respect to the change 
of scale and the change of origin. 
Hint: Apply the definition of r given in Eq. (3.5.13). 
Note: The operations aX,, X; + b, and aX; + b are known, respectively, as the change of scale, change of 
origin, and change of both scale and origin. 
If r, the coefficient of correlation between n pairs of values (X;, Y,), is positive, then determine whether 
each of the following statements is true or false: 

a. r between (—X;, —Y,) is also positive. 

b. r between (—X;,, Y;) and that between (X; —Y;) can be either positive or negative. 

c. Both the slope coefficients B,, and B, are positive, where B,, = slope coefficient in the regression 

of Y on X and $, = slope coefficient in the regression of X on Y. 
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3.13. If X|. X, and X, are uncorrelated variables each having the same standard deviation, show that the 
coefficient of correlation between X, + X, and X, + X; is equal to 1 Why is the correlation coefficient 
not zero? 

3.14. In the regression Y, = B, + BX, + u; suppose we multiply each X value by a constant, say, 2. Will it 
change the residuals and fitted values of Y? Explain. What if we add a constant value, say, 2, to each X 
value? 

3.15. Show that Eq. (3.5.14) in fact measures the coefficient of determination. Hint: Apply the definition of 
r given in Eq. (3.5.13) and recall that }° y, p, = $ (p; + 4); = $ ¥?,and remember Eq. (3.5.6). 

3.16. Explain with reason whether the following statements are true, false, or uncertain: 

a. Since the correlation between two variables, Y and X, can range from —1 to + 1, this also means that 
cov (Y, X) also lies between these limits. 

b. If the correlation between two variables is zero, it means that there is no relationship between the 
two variables whatsoever. 

c. If you regress Y; on Y, (i.e., actual Y on estimated Y), the intercept and slope values will be 0 and 1, 
respectively. 

3.17. Regression without any regressor. Suppose you are given the model: Y, = B, + u; Use OLS to find the 
estimator of 8,. What is its variance and the RSS? Does the estimated 8, make intuitive sense? Now 
consider the two-variable model Y, = B, + BX; + u; Is it worth adding X; to the model? If not, why 
bother with regression analysis? 


Empirical Exercises 


3.18. In Table 3.5, you are given the ranks of 10 students in midterm and final examinations in statistics. 
Compute Spearman’s coefficient of rank correlation and interpret it. 


Table 3.5 

Student 
Rank A B G D E F G H l J 
Midterm 1 3 7 10 9 5 4 8 2 6 
Final - 3 2 8 7 9 6 5 10 1 4 


3.19. The relationship between nominal exchange rate and relative prices. From annual observations from 
1985 to 2005, the following regression results were obtained, where Y = exchange rate of the Canadian 
dollar to the U.S. dollar (CD/$) and X = ratio of the U.S. consumer price index to the Canadian consumer 
price index; that is, X represents the relative prices in the two countries: 


A 


f, = —0.912+2.250X, r° = 0.440 
sc 0.096 
a. Interpret this regression. How would you interpret he 
b. Does the positive value of X, make economic sense? What is the underlying economic theory? 
c. Suppose we were to redefine X as the ratio of the Canadian CPI to the U.S. CPI. Would that change 
the sign of X? Why? 

3.20. Table 3.6 gives data on indexes of output per hour (X) and real compensation per hour (Y) for the 
business and nonfarm business sectors of the U.S. economy for 1960-2005. The base year of the 
indexes is 1992 = 100 and the indexes are seasonally adjusted. 

a. Plot Y against X for the two sectors separately. 

b. What is the economic theory behind the relationship between the two variables? Does the 
scattergram support the theory? 

c. Estimate the OLS regression of Y on X. Save the results for a further look after we study Chapter 5. 
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Table 3.6 Productivity and Related Data, Business Sector 1960-2005 (Index numbers, 1992 = 100; quarterly data 


seasonally adjusted) 
Output per Hour of All Real Compensation per 
Persons) a 
Nonfarm Nonfarm 
Business Business Business Business 

Year Sector Sector Sector Sector 
1960 48.9 51.9 — 60.8 63.3 
1961 50.6 53.5 =- 62.5 64.8 
1962 52.9 * 55.9 64.6 66.7 
1963 55.0 57.8 66.1 68.1 
1964 56.8 59.6 67.7 69.3 
1965 58.8 61.4 i 69.1 70.5 
1966 61.2 63.6 ZNZ 72.6 
1967 62.5 64.7 ' ENS 74.5 
1968 64.7 66.9 76.2 ZZA 
1969 65.0 67.0 Z3 78.1 
1970 66.3 68.0 78.8 79.2 
1971 69.0 -> 70.7 80.2 80.7 
1972 ; eAlerts 73a 82.6 83.2 
1973 73.4 753 84.3 84.7 
1974 723 74.2 83.3 83.8 
1975 74.8 76.2 84.1 84.5 
1976 ZZA 78.7 86.4 86.6 
1977 78.5 80.0 87.6 88.0 
1978 79.3 81.0 89.1 89.6 
1979 79.3 80.7 89.3 89.7 
1980 79.2 80.6 89.1 89.6 
1981 80.8 81.7 89.3 89.8 
1982 80.1 80.8 90.4 90.8 
1983 83.0 84.5 90.3 90.9 
1984 85.2 86.1 90.7 91.1 
1985 87.1 87.5 92.0 92.2 
1986 89.7 90.2 94.9 95.2 
1987 90.1 90.6 95.2 95.5 
1988 91.5 92.1 96.5 96.7 
1989 - 92.4 92.8 95.0 95.1 
1990 94.4 94.5 96.2 96.1 a, 
1991 95.9 96.1 97.4 97.4 
1992 100.0 100.0 100.0 100.0 
1993 100.4 100.4 99.7 99.5 
1994 101.3 101.5 99.0 99.1 
1995 101.5 102.0 98.7 98.8 
1996 104.5 104.7 994 a 99.4 
1997 106.5 106.4 100.5 100.3 
1998 - 109.5 109.4 105.2 104.9 
1999 112.8 112.5 108.0 107.5 
2000 116.1 115.7 112.0 111.5 
2001 119.1 118.6 135 112.8 
2002 124.0 123.5 115.7 115.1 
2003 128.7 128.0 117.7 117.1 
2004 132.7 131.8 - 119.0 118.2 
2005 135.7 134.9 120.2 119.3 


‘Output refers to real gross domestic product in the sector. 
Wages and salaries of employees plus employers’ contributions for social insurance and private benefit plans. 
‘Hourly compensation divided by the consumer price index for all urban consumers for recent quarters. 


Source: Economic Report of the President, 2007, Table 49. 
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From a sample of 10 observations, the following results were obtained: 


> 4% = 1,110 $ Xi = 1,700 D Xi ¥; = 205,500 


J X = 322,000 Y` Y? = 132,100 


with coefficient of correlation r = 0.9758. But on rechecking these calculations it was found that two 
pairs of observations were recorded: 


Y x Y x 
90 120 . 80 110 
io do ion io 


What will be the effect of this error on r? Obtain the correct r. 
Table 3.7 gives data on gold prices, the wholesale price index (WPI) and the BSE Sensex Index for 
India for the period 1979-80 to 2007-08. 
a. Plot in the same scattergram gold prices, WPI and Sensex index 
b. An investment is supposed to be a hedge against inflation if its price and/or rate of return at least 
keeps pace with inflation. To test this hypothesis, suppose you decide to fit the following model, 
assuming the scatterplot in (a) suggests that this is appropriate: 
Gold Price, = B, + B,WPI, + u, 
Sensex Index, = 8, + B WPI, + u, 
What can you conclude from your results? 
Table 3.8 gives data on gross domestic product (GDP) for India for the years 1951-52 to 2004-05. 
a. Plot the GDP data in current and constant (1999-2000) dollars against time. 
b. Letting Y denote GDP and X time (measured chronologically starting with 1 for 1951-52, 2 for 
1952-53, through 54 for 2004-05), see if the following model fits the GDP data: 
= Bi + BX; +u; 
Estimate this model for both nominal and real GDP. 
c. How would you interpret B,? 
d. If there is a difference between £, estimated for current-rupee GDP and that estimated for constant 
GDP, what explains the difference? 
e. From your results, what can you say about the nature of inflation in India over the sample period? 
Using the data given in Table I.1 of the Introduction, verify Eq. (3.7.1). 
For the SAT example given in Exercise 2.16 do the following: 
a. Plot the female reading score against the male reading score. 
b. If the scatterplot suggests that a linear relationship between the two seems appropriate, obtain the 
regression of female reading score on male reading score. 
c. If there is a relationship between the two reading scores, is the relationship causal? 
Repeat Exercise 3.25, replacing math scores for reading scores. 
Monte Carlo study classroom assignment: Refer to the 10 X values given in Table 2.4. Let B; = 25 
and B, = 0.5. Assume u; = N(0, 9), that is, u; are normally distributed with mean 0 and variance 9. 
Generate 100 samples using these values, obtaining 100 estimates of 8, and B,. Graph these estimates. 
What conclusions can you draw from the Monte Carlo study? Note: Most statistical packages now can 
generate random variables from most well-known probability distributions. Ask your instructor for 
help, in case you have difficulty generating such variables. 
Using the data given in Table 3.3, plot the number of cell phone subscribers against the number of 
personal computers in use. Is there any discernible relationship between the two? If so, how do you 
rationalize the relationship? 
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Table 3.7 Gold price, Sensex index and WPI for India for 1979-80 to 2007-08 
jn er Eee 


Year Gold price Sensex WPI 
1979-80 1158.75 12232 Sez 
1980-81 1522.44 138.87 36.9 
1981-82 1719.17 207.91 40.4 
1982-83 1722.54 221.51 i : 41.4 
1983-84 1858.47 238.33 45.3 
1984-85 1983.92 266.19 48.5 
1985-86 2125.47 492.23. Sirs 
1986-87 2323.49 567.39 54 
1987-88 3082.43 . 454.46 58.2 
1988-89 317522 613.66 62.2 
1989-90 3229.33 729.49 66.9 
1990-91 3451.52 1049.53 UETA 
1991-92 4297.63 1879.51 83.9 
1992-93 4103.66 2895.67 92.3 
1993-94 4531.87 2898.69 100 
1994-95 4667.24 3974.91 112.6 
1995-96 4957.6 3288.68 121.6 
1996-97 5070.71 3469.24 i 1272 
1997-98 4347.07 3812.86 132.8 
1998-99 4268 3294.78 140.7 
1999-2000 4393.56 4658.63 145.3 
2000-01 4473.6 4269.69 155.7 
2001-02 4579.12 3331.95 ie 
2002-03 5332.36 3206.29 166.8 
2003-04 5718.95 4492.19 1759 
2004-05 6145.38 . 5740.52 187.3 
2005-06 6900.56 8278.55 195.6 
2006-07 9240.32 T2277 2s 206.2 
2007-08 9995.62 16569 -215.7 


Notes: 


© Gold Price: Price of Gold at Mumbai (Rs. per 10 grams), Source: Handbook of Statistics on Indian Economy, Reserve Bank of India, Mumbai 
e@ Sensex Index : Bombay Stock Exchange Sensex Index (Annual Average) , Base year: 1978-79 = 100; Source: Handbook of Statistics on the Indian 
Securities Market 2008, SEBI, Mumbai 


@ WPI: Wholesale price index with 1993-94 as base year; Source: Handbook of Industrial Policy and Statistics 2007-08, Government of India 
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Table 3.8 Nominal and Real Gross Domestic Product, 1951—52 to 2004-05 (Rupee Crore) 


Year NGDP RGDP Year NGDP RGDP Year NGDP RGDP 
1951-52 10721 242995 1970-71 46249 517148 1989-90 487684 1131111 
1952-53 10522 249386 1971-72 29523 525584 1990-91 569624 1193650 
1953-54 11452 264720 1972-73 54591 522698 1991-92 654729 1206346 
1954-55 10834 277428 1973-74 66428 540045 1992-93 752591 1272457 
1955-56 11030 286370 1974-75 78426 546443 1993-94 865805 1333123 
1956-57 13140 302352 1975-76 84221 596428 1994-95 1015764 1421831 
1957-58 13536 301063 1976-77 90751 606301 1995-96 1191813 1529453 
1958-59 15086 323324 1977-78 102796 650311 1996-97 1378617 1645037 
1959-60 15895 331784 1978-79 111371 687435 1997-98 1527158 1711735 
1960-61 17407 350117 1979-80 122155 651430 1998-99 1751199 1817752 
1961-62 18445 363110 1980-81 145370 695361 1999-00 1952036 1952035 
1962-63 19826 373698 1981-82 170805 737078 2000-01 2102314 2030711 
1963-64 22774 396034 1982-83 191059 762622 2001-02 2278952 2136651 
1964-65 26563 425560 1983-84 222485 818288 2002-03 2424261 2217133 
1965-66 28016 414263 1984-85 249268 849573 2003-04 2754620 2402727 
1966-67 31711 414115 1985-86 281330 894041 2004-05 3149407 2602065 
1967-68 37133 446548 1986-87 314816 936671 
1968-69 39324 461612 1987-88 357861 973739 
1969-70 43298 491798 1988-89 424531 1067582 


Note: NGDP: Nominal Gross Domestic Product in Rupee Crore at market price 
RGDP: Real Gross Domestic Product in Rupee Crore at 1999-2000 prices 


Source: Handbook of Statistics on Indian Economy, RBI, 2009-10 


Key to Multiple Choice Questions 
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Appendix 3A 


3A.! Derivation of Least-Squares Estimates 


Differentiating Eq. (3.1.2) partially with respect to Êi and Bo, we obtain 


aia) og — | 
tc =-29 0; — bi - ÊX) =-2 a (1) 
a( oa; — 
ate) =-2) (7; — Bi — Xi) Xi = -29 aX (2) 
2 s 


Setting these equations to zero, after algebraic simplification and manipulation, gives the estimators given in Eqs. (3.1.6) 
and (3.1.7). 
3A.2 Linearity and Unbiasedness Properties of Least-Squares Estimators 


From Eq. (3.1.8) we have 
A g 
Ba = a = kY, -O 


where 


Xi 


“ay 


which shows that A; is a linear estimator because it is a linear function of Y; actually it is a weighted average of Y, with 
k; serving as the weights. It can similarly be shown that A; too is a linear estimator. 

Incidentally, note these properties of the weights k;: 

1, Since the X; are assumed to be nonstochastic, the k; are nonstochastic too. 


22 k=O 
3. Eke =1/Xx. . 
4. J` kixi = }_ kiX; = 1. These properties can be directly verified from the definition of k. 


For example, 


Xi l : - 
Xk = x (=) = =e 2R since for a given sample J- x? is known 


= {0} since J` x;, the sum of deviations from the mean value, is 
always zero 


Now substitute the PRF Y, = B, + BX; + u; into Equation (3) to obtain 
Bo = > ki(Bi + BoXi + ui) 
= Bi Doki + Ba Dux +9 kiui (4) 
= ok X kiu; 


where use is made of the properties of k; noted earlier. 
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Now taking expectation of Equation (4) on both sides and noting that k, being nonstochastic, can be treated as 
constants, we obtain 


E(o) = p2 + ki E(ui) 
= fr 


since E(u,;) = 0 by assumption. Therefore, Êz is an unbiased estimator of B>. Likewise, it can be proved that ĝi is also an 
unbiased estimator of 8}. 


(5) 


3A.3 Variances and Standard Errors of Least-Squares Estimators 
Now by the definition of variance, we can write 


var (62) = E[ 2 — E(B2)P 
= E(B, — 2)’ since E(B2) = f2 
= ip iS kus) using Eq. (4) above ©) 


= E(u + Bud +- + ku? + 2kikauiu +-+- + 2kin—1hn tn itn) 
Since by assumption, E(u?) = o° for each i and E(u,u;) = 0, i # j, it follows that 
var (B2) = 07 °K 


= (using the definition of k?) (7) 


= Eq. (3.3.1) 


The variance of ĝ; can be obtained following the same line of reasoning already given. Once the variances of 
Êi and Bo are obtained, their positive square roots give the corresponding standard errors. 


3A.4 Covariance between (3, and £3 
By definition, 


cov (Ês, Bo) = EttA: — E(B: )IB2 — ECN 
= E(B; — B:)(B2 — P2) (Why?) 
= —XE(B2 — pr)’ (8) 
= —X var (B2) 
= Eq. (3.3.9) 


where use is made of the fact that Â; = Y ~ BX and E(Ê1) = Y — BoX, giving By — E(/:) = A — fo). Note: var 
(Êz) is given in Eq. (3.3.1). 
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3A.5 The Least-Squares Estimator of a? 


Recall that 
Y; = Pı + BX; + ui (9) 
Therefore, 
Y = Bi + PX +i (10) 
Subtracting Equation (10) from Equation (9) gives 
Ve = fa ye (11) 
Also recall that 
ü; = Yi — Boxi (12) 
Therefore, substituting Equation (11) into Equation (12) yields 
fi; = Boxi + (ui — id) — Box (13) 
Collecting terms, squaring, and summing on both sides, we obtain 
> a? = (Bp — Br)” Dx? + Dui — i)? = 2(B2 — Be) Do xu — i) (14) 
Taking expectations on both sides gives 
E(S0@) = x78 — by + E [Zon -0| - 28 [A -p Dui - 
= Ss var (B2) + (n — 1) var(u;) — 2E D kiur(xiui) | 
=67 -in— la? 2E bz. kixiu?] (15) 


=o o DE =R 


= (n — 2)0? 


where, in the last but one step, use is made of the definition of k; given in Eq. (3) and the relation given in Eq. (4). Also 


note that 


Pres 


=o? — —-o" =n —1)0" 
n 


where use is made of the fact that the uv; are uncorrelated and the variance of each u, is 0”. 


Thus, we obtain 


E i ii?) = (n —2)o? 


Therefore, if we define 


(16) 


(17) 
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its expected value is 
l 
a2 — a2 = 2 è : 
E(o*) = PE oa ( J â?) o using Equation (16) (18) 


which shows that 67 is an unbiased estimator of true g °. 


3A.6 Minimum-Variance Property of Least-Squares Estimators 


It was shown in Appendix 3A. Section 3A.2, that the least-squares estimator fy is linear as well as unbiased (this holds 
true of f; too). To show that these estimators are also minimum variance in the class of all linear unbiased estimators, 
consider the least-squares estimator p>: 


where 


X; —X j ; 
kj = Fao = 7 (see Appendix 3A.2) (19) 
i i 


which shows that ĝ, is a weighted average of the Y’s, with k; serving as the weights. 
Let us define an alternative linear estimator of 8, as follows: 


Bs = Domi (20) 
where w, are also weights, not necessarily equal to k, Now 
E(B3) = > wiE%) 
=Y w:(bı + BX) (21) 


= Bi iwi + Bo PWA 
Therefore, for 83 to be unbiased, we must have 
De w; = 0 (22) 
and 
> n= | (23) 


Also, we may write 
var (£3) = var >) wiT, 
=) w var ¥; [Note: var Y; = varu; = 07] 


=o yw [Note: cov (Yi, Yj) = 0(i # j)] 


a 
2 Xi Xi i É 
= i— =; + Note the mathematical trick) 
ha boa sz) 


7 PD» - =a abl na icii (» G =) (=a) 
= 09D (w - sa) ve (=) (24) 


because the last term in the next to the last step drops out. (Why?) 
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Since the last term in Equation (24) is constant, the variance of (£7) can be minimized only by manipulating the first 
term. If we let 


Xi 
ee 
Eq. (24) reduces to 
* a 
var (By) = yx (25) 
= var (B2) 


In words, with weights w; = k, which are the least-squares weights, the variance of the linear estimator B} is equal to 
the variance of the least-squares estimator Bo; otherwise var (8%) > var ( po). To put it differently, if there is a minimum- 
variance linear unbiased estimator of B,, it must be the least-squares estimator. Similarly it can be shown that By is a 
minimum-variance linear unbiased estimator of 8}. 


3A.7 Consistency of Least-Squares Estimators 


We have shown that, in the framework of the classical linear regression model, the least-squares estimators are unbiased 
(and efficient) in any sample size, small or large. But sometimes, as discussed in Appendix A, an estimator may not 
satisfy one or more desirable statistical properties in small samples. But as the sample size increases indefinitely. the 
estimators possess several desirable statistical properties. These properties are known as the large sample, or asymp- 
totic, properties. In this appendix, we will discuss one large sample property. namely, the property of consistency, which 
is discussed more fully in Appendix A. For the two-variable model we have already shown that the OLS estimator Bo is 
an unbiased estimator of the true B,. Now we show that p> is also a consistent estimator of B,. As shown in Appendix A, 
a sufficient condition for consistency is that bo is unbiased and that its variance tends to zero as the sample size n tends 
to infinity. 

Since we have already proved the unbiasedness property, we need only show that the variance of > tends to zero as 
n increases indefinitely. We know that 

R a? o?/n 
vat ba) = Soa Sa (26) 

By dividing the numerator and denominator by n, we do not change the equality. 

Now v 


; A ge o?/n a 
lim var (62) = lim (=a = =0 


a 
uA ee] n > 00 


(27) 


where use is made of the facts that (1) the limit of a ratio quantity is the limit of the quantity in the numerator to the limit 
of the quantity in the denominator (refer to any calculus book); (2) as n tends to infinity, @7/n tends to zero because o? 
is a finite number; and [(92x7)/n] # 0 because the variance of X has a finite limit because of Assumption 7 of CLRM. 

The upshot of the preceding discussion is that the OLS estimator fp is a consistent estimator of true Bə. In like fashion, 
we can establish that By i is also a consistent estimator. Thus, in repeated (small) samples, the OLS estimators are unbiased 
and as the sample size increases indefinitely the OLS estimators are consistent. As we shall see later, even if some of the 
assumptions of CLRM are not satisfied, we may be able to obtain consistent estimators of the regression coefficients in 
several situations. 


CHAPTER 


Classical Normal Linear 
Regression Model (CNLRM) 


What is known as the classical theory of statistical inference consists of two branches, namely, estimation 
and hypothesis testing. We have thus far covered the topic of estimation of the parameters of the (two-variable) 
linear regression model. Using the method of OLS we were able to estimate the parameters 6}, B>, and 
o7. Under the assumptions of the classical linear regression model (CLRM), we were able to show that 
the estimators of these parameters, Bi, 2, and &?, satisfy several desirable statistical properties, such as 
unbiasedness, minimum variance, etc. (Recall the BLUE property.) Note that, since these are estimators, their 
values will change from sample to sample. Therefore, these estimatcrs are random variables. 

But estimation is half the battle. Hypothesis testing is the other half. Recall that in regression analysis our 
objective is not only to estimate the sample regression function (SRF), but also to use it to draw inferences 
about the population regression function (PRF), as emphasized in Chapter 2. Thus, we would like to find out 
how close Bi is to the true B, or how close 6 is to the true o°. For instance, in Example 3.2, we estimated the 
SRF as shown in Eq. (3.7.2). But since this regression is based on a sample of 55 families, how do we know 
that the estimated MPC of 0.4368 ao the (true) MPC in the population as a whole? 

Therefore, since pr p>, and 6? are random variables, we need to find out their probability distributions, 
for without that knowledge we will not be able to relate them to their true values. 


4.1. The Probability Distribution of Disturbances u, 
To find out the probability distributions of the OLS estimators, we proceed as follows. Specifically, consider 
Bp. As we showed in Appendix 3A.2, 

= ap (4.1.1) 


where k; = x; /_ x?. But since the X’s are assumed fixed, or nonstochastic, because ours is conditional regression 
analysis, conditional on the fixed values of X,, Equation 4.1.1 shows that Ê is a linear function of Y, which 
is random by assumption. But since Y, = B, + B.X, + u;, we can write Eq. (4.1.1) as 


=) R + BX: +41) (4.1.2) 
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Because k, the betas, and X, are all fixed, b> is ultimately a linear function of the random variable u;, which 
is random by assumption. Therefore, the probability distribution of A> (and also of B;) will depend on the 
assumption made about the probability distribution of u;. And since knowledge of the probability distribu- 
tions of OLS estimators is necessary to draw inferences about their population values, the nature of the 
probability distribution of u; assumes an extremely important role in hypothesis testing. 

Since the method of OLS does not make any assumption about the probabilistic nature of u,, it is of 
little help for the purpose of drawing inferences about the PRF from the SRF, the Gauss—Markov theorem 
notwithstanding. This void can be filled if we are willing to assume that the u’s follow some probability 
distribution. For reasons to be explained shortly, in the regression context it is usually assumed that the ws 
follow the normal distribution. Adding the normality assumption for u; to the assumptions of the classical 
linear regression model (CLRM) discussed in Chapter 3, we obtain what is known as the classical normal 
linear regression model (CNLRM). 


4.2 The Normality Assumption for u; 


The classical normal linear regression model assumes that each u; is distributed normally with 


Mean: EG) —0 (4.2.1) 
Variance: E[u; — E(u;) = E(u?) = o° (4.2.2) 
cov (uj, uj): E{[(u; — E(u) uj — Eu) = Elu: u) =0 i FJ _ (4.2.3) 
The assumptions given above can be more compactly stated as 
u; ~ N(0, 07) —— (4.2.4) 


where the symbol ~ means distributed as and N stands for the normal distribution, the terms in the paren- 
theses representing the two parameters of the normal distribution, namely, the mean and the variance. 

As noted in Appendix A, for two normally distributed variables, zero covariance or correlation means 
independence of the two variables. Therefore, with the normality assumption, Equation 4.2.4 means that u; 
and u; are not only uncorrelated but are also independently distributed. 

Therefore, we can write Eq. (4.2.4) as 


u; ~ NID(0, 0”) n (4.2.5) 
where NID stand for normally and independently distributed. 


Why the Normality Assumption? 


Why do we employ the normality assumption? There are several reasons: 

1. As pointed out in Section 2.5, u; represent the combined influence (on the dependent variable) of a large 
number of independent variables that are not explicitly introduced in the regression model. As noted, we hope 
that the influence of these omitted or neglected variables is small and at best random. Now by the celebrated 
central limit theorem (CLT) of statistics (see Appendix A for details), it can be shown that if there are a 
large number of independent and identically distributed random variables, then, with a few exceptions, the 
distribution of their sum tends to a normal distribution as the number of such variables increases indefinitely. ! 
It is the CLT that provides a theoretical justification for the assumption of normality of u,. 


"For a relatively simple and straightforward discussion of this theorem, see Sheldon M. Ross, Introduction to Probability and 
Statistics for Engineers and Scientists, 2d ed., Harcourt Academic Press, New York, 2000, pp. 193-194. One exception to the 
theorem is the Cauchy distribution, which has no mean or higher moments. See M. G. Kendall and A. Stuart, The Advanced 
Theory of Statistics, Charles Griffin & Co., London, 1960, vol. 1, pp. 248-249. 
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2. A variant of the CLT states that, even if the number of variables is not very large or if these variables are 
not strictly independent, their sum may still be normally distributed.” 

3. With the normality assumption, the probability distributions of OLS estimators can be easily derived 
because, as noted in Appendix A, one property of the normal distribution is that any linear function of 
normally distributed variables is itself normally distributed. As we discussed earlier, OLS estimators B i 
and $; are linear functions of u, Therefore, if u, are normally distributed, so are B, and fy, which makes our 
task of hypothesis testing very straightforward. 

4. The normal distribution is a comparatively simple distribution involving only two parameters (mean and 
variance); it is very well known and its theoretical properties have been extensively studied in mathematical 
statistics. Besides, many phenomena seem to follow the normal distribution. 

5. If we are dealing with a small, or finite, sample size, say data of less than 100 observations. the normality 
assumption assumes a critical role. It not only helps us to derive the exact probability distributions of OLS 
estimators but also enables us to use the rt, F, and y” statistical tests for regression models. The statistical 
properties of t, F, and y` probability distributions are discussed in Appendix A. As we will show subse- 
quently, if the sample size is reasonably large, we may be able to relax the normality assumption. 

6. Finally, in large samples, t and F statistics have approximately the r and F probability distributions so 
that the ż and F tests that are based on the assumption that the error term is normally distributed can still be 
applied validly.’ These days there are many cross-section and time series data that have a fairly large number 
of observations. Therefore, the normality assumption may not be very crucial in large data sets. 

A cautionary note: Since we are “imposing” the normality assumption, it behooves us to find out in 
practical applications involving small sample size data whether the normality assumption is appropriate. 
Later, we will develop some tests to do just that. Also, later we will come across situations where the 
normality assumption may be inappropriate. But until then we will continue with the normality assumption 
for the reasons discussed previously. - 


4.3 Properties of OLS Estimators under the Normality Assumption 


With the assumption that u, follow the normal distribution as in Equation 4.2.5, the OLS estimators have 
the following properties (Appendix A provides a general discussion of the desirable statistical properties of 
estimators): 

1. They are unbiased. 

2. They have minimum variance. Combined with 1, this means that they are minimum-variance unbiased, 
or efficient estimators. 

3. They have consistency; that is, as the sample size increases indefinitely, the estimators converge to their 
true population values. 

4. Bi (being a linear function of u;) is normally distributed with 


Mean:  E(Â1) = Bi (4.3.1) 
7 re 
var(pi): 05 = Aho = (3.3.3) (4.3.2) 


2For the various forms of the CLT, see Harald Cramer, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, Nj, 1946, Chap. 17. 

3For a technical discussion on this point, see Christiaan Heij et al., Econometric Methods with Applications in Business and 
Economics, Oxford University Press, Oxford, 2004, p. 197. 
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Or more compactly, 


Bi ~ N (Bio; ) 


Then by the properties of the normal distribution, the variable Z, which is defined as 


z= bı — Êı 
aĝ, 


( 


4.3.3) 


follows the standard normal distribution, that is, a normal distribution with zero mean and unit (= 1) 


variance, or 
Z ~ N(0, 1) 


5. Bp (being a linear function of u;) is normally distributed with 


Mean:  E(ĝ2) = h 


Or, more compactly, 


Then, as in Equation 4.3.3, 


also follows the standard normal distribution. 


Geometrically, the probability distributions of Êi and Bo are shown in Figure 4.1. 


f(B,) A 
2 > 
; 2 
= a 
- Ê 
E(B,) = B, i 
È 
Z 2 
é : 
(=) 
a Ê J B, 
0 a oR 


Figure 4.1 Probability distributions of B and By. 


(4.3.4) 


= (3.3.1) (4.3.5) 


(4.3.6) 
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6. (n — 2)(G?/o7) is distributed as the x° (chi-square) distribution with (n — 2)df.4 This knowledge will 
help us to draw inferences about the true o? from the estimated a, as we will show in Chapter 5. (The 
chi-square distribution and its properties are discussed in Appendix A.) 

7. (Bı, 2) are distributed independently of ¢*. The importance of this will be explained in the next 
chapter. . 

8. 6; and By have minimum variance in the entire class of unbiased estimators, whether linear or not. 
This result, due to Rao, is very powerful because, unlike the Gauss—Markov theorem, it is not restricted to 
the class of linear estimators only.° Therefore, we can say that the least-squares estimators are best unbiased 
estimators (BUE); that is, they have minimum variance in the entire class of unbiased estimators. 

To sum up: The important point to note is that the normality assumption enables us to derive the proba- 
bility, or sampling, distributions of 8, and 6) (both normal) and G* (related to the chi square). As we will 
see in the next chapter, this simplifies the task of establishing confidence intervals and testing (statistical) 
hypotheses. 

In passing, note that, with the assumption that u; ~ N(0, 07), Y, being a linear function of u; is itself 
normally distributed with the mean and variance given by 


E(Y;) = Bi + PoXi (4.3.7) 
var (Y;) = o? (4.3.8) 

More neatly, we can write 
Y; ~ N(Bi + Xi, 0°) (4.3.9) 


4.4 The Method of Maximum Likelihood (ML) 


A method of point estimation with some stronger theoretical properties than the method of OLS is the method 
of maximum likelihood (ML). Since this method is slightly involved, it is discussed in the appendix to this 
chapter. For the general reader, it will suffice to note that if u; are assumed to be normally distributed, as we 
have done for reasons already discussed, the ML and OLS estimators of the regression coefficients, the B’s, 
are identical, and this is true of simple as well as multiple regressions. The ML estimator of ise) It 
This estimator is biased, whereas the OLS estimator of o? = Y` ù? /(n — 2), as we have seen, is unbiased. 
But comparing these two estimators of a’, we see that as the sample size n gets larger the two estimators of 
g? tend to be equal. Thus, asymptotically (i.e., as n increases indefinitely), the ML estimator of a” is also 
unbiased. 

Since the method of least squares with the added assumption of normality of u; provides us with all the 
tools necessary for both estimation and hypothesis testing of the linear regression models, there is no loss 
for readers who may not want to pursue the maximum likelihood method because of its slight mathematical 


complexity. 


‘The proof of this statement is slightly involved. An accessible source for the proof is Robert V. Hogg and Allen T. Craig, 
Introduction to Mathematical Statistics, 2d ed., Macmillan, New York, 1965, p. 144. 
5C. R. Rao, Linear Statistical Inference and Its Applications, john Wiley & Sons, New York, 1965, p. 258. 
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Summary and Conclusions 


1. This chapter discussed the classical normal linear regression model (CNLRM). 

2. This model differs from the classical linear regression model (CLRM) in that it specifically assumes 
that the disturbance term u; entering the regression model is normally distributed. The CLRM does not 
require any assumption about the probability distribution of u;; it only requires that the mean value of 
u; is zero and its variance is a finite constant. l 

3. The theoretical justification for the normality assumption is the central limit theorem. 

4. Without the normality assumption, under the other assumptions discussed in Chapter 3, the Gauss— 
Markov theorem showed that the OLS estimators are BLUE. 

5. With the additional assumption of normality, the OLS estimators are not only best unbiased estimators 
(BUE) but also follow well-known probability distributions. The OLS estimators of the intercept and 
slope are themselves normally distributed and the OLS estimator of the variance of u; (= õ?) is related 
to the chi-square distribution. 

6. In Chapters 5 and 8 we show how this knowledge is useful in drawing inferences about the values of 
the population parameters. 

7. An alternative to the least-squares method is the method of maximum likelihood (ML). To use this 
method, however, one must make an assumption about the probability distribution of the disturbance 
term u;. In the regression context, the assumption most popularly made is that u; follows the normal 
distribution. 

8. Under the normality assumption, the ML and OLS estimators of the intercept and slope parameters 
of the regression model are identical. However, the OLS and ML estimators of the variance of u; are 
different. In large samples, however, these two estimators converge. 

9. Thus the ML method is generally called a large-sample method. The ML method is of broader 
application in that it can also be applied to regression models that are nonlinear in the parameters. In 
the latter case, OLS is generally not used. For more on this, see Chapter 14. 

10. In this text, we will largely rely on the OLS method for practical reasons: (a) Compared to ML, the 
OLS is easy to apply; (b) the ML and OLS estimators of 6, and B- are identical (which is true of 
multiple regressions too); and (c) even in moderately large samples the OLS and ML estimators of o? 
do not differ vastly. 

However, for the benefit of the mathematically inclined reader, a brief introduction to ML is given in the 

appendix to this chapter and also in Appendix A. 


Appendix 4A 


4A.| Maximum Likelihood Estimation of Two-Variable Regression Model 


Assume that in the two-variable model Y, = 8, + B-X, + u, the Y, are normally and independently distributed with mean = 
Bı + BX; and variance = g^. (See Eq. [4.3.9].) As a result, the joint probability density function of Y}. Y,,.... Y, given the 
preceding mean and variance, can be written as 


AOG Yz, -s Yn | Bi + BoXj, 07) 
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But in view of the independence of the Y's, this joint probability density function can be written as a product of n 
individual density functions as 


S(%1, Yo, .--5 Yn | Bi + B2X), 07) 
= f(Yi| Bi + B2Xi, 07) SO | Bi + B2Xi,07)-++ fn | Bi + BoXi, 07) (1) 


where 


1 ae 
o/2n 2 o? 


fŒ) = exp (2) 
which is the density function of a normally distributed variable with the given mean and variance. 
(Note: exp means e to the power of the expression indicated by { }.) 
Substituting Equation (2) for each Y; into Equation (1) gives 
y 1 1 Y; — Bi OR 
Vis Yao Talh EBA 0?) = —— exp | =5 FL Bot | (3) 
“of (v 27) 9 


If Y,. Y>..... Y„ are known or given, but B,. B», and g? are not known, the function in Equation (3) is called a likelihood 
function, denoted by LF(8,, B>, 0”), and written as! 


1 1 Y; — Bi — BX: 
iinet = 2 et 4) 


on( V2) 


The method of maximum likelihood, as the name indicates, consists in estimating the unknown parameters in such a 
manner that the probability of observing the given Y’s is as high (or maximum) as possible. Therefore, we have to find the 
maximum of the function in Equation (4). This is a straightforward exercise in differential calculus. For differentiation it 
is easier to express Equation (4) in the log term as follows.” (Note: In = natural log.) 


i 1 q (Yi — Bi — RX? 
eC) eg 


ee eo Oe O 
Differentiating Equation (5) partially with respect to B,, B,, and 07, we obtain 
a = - DC - pi - AXD © 
= = -4 EO - i - BXA) K 
E = 53 +ga LUN -pi aX @) 


‘Of course, if B,, Bz, and a? are known but the Y, are not known, Eq. (4) represents the joint probability density function— 
the probability of jointly observing the Y, 
2Since a log function is a monotonic function, In LF will attain its maximum value at the same point as LF. 
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Setting these equations equal to zero (the first-order condition for optimization) and letting 1, b2, and G? denote the ML 
estimators, we obtain? 


1 x ” 
A E 0 (9) 
1 z z 

z Dh — Bi — 2X) Xi = 0 (10) 

1 E a ` 3 
eg eg Pee — 0 (11) 

After simplifying, Eqs. (9) and (10) yield 

YY =nbi th > Xi (12) 


4X =b Xi +h OX? (13) 


which are precisely the normal equations of the least-squares theory obtained in Eqs. (3.1.4) and (3.1.5). Therefore, the 
ML estimators, the ĝ’s, are the same as the OLS estimators, the B’s, given in Egs. (3.1.6) and (3.1.7). This equality is not 
accidental. Examining the likelihood (5), we see that the last term enters with a negative sign. Therefore, maximizing 
Equation (5) amounts to minimizing this term, which is precisely the least-squares approach, as can be seen from 
Bas(3.1-2). 

Substituting the ML ( = OLS) estimators into Equation (11) and simplifying, we obtain the ML estimator of G7 as 


1 = ie 
6? = — ghis) 
i A : 
=- 2 OA — 2X)? m _ (14) 


=" yw 


From Equation (14) it is obvious that the ML estimator G2 differs from the OLS estimator ô? = [1/(n — 2) > igs which 
was shown to be an unbiased estimator of a” in Appendix 3A, Section 3A.5. Thus, the ML estimator of o° is biased. The 
magnitude of this bias can be easily determined as follows. ~ 
Taking the mathematical expectation of Equation (14) on both sides, we obtain 


n—2 ae l 
= ( Je using Eq. (16) of Appendix 3A, (15) 
, Section 3A.5 i 


3We use ” (tilde) for ML estimators and ^ (cap or hat) for OLS estimators. 
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which shows that &? is biased downward (i.e., it underestimates the true a) in small samples. But notice that as n, the 
sample size, increases indefinitely, the second term in Equation (15), the bias factor, tends to be zero. Therefore, asymp- 
totically (i.e., in a very large sample), &? is unbiased too, that is, lim E(a*) = o? as n —> ~. It can further be proved that 
õ* is also a consistent estimator*; that is, as n increases indefinitely, 6° converges to its true value a”. 


4A.2 Maximum Likelihood Estimation of Food Expenditure in India 


Return to Example 3.2 and Equation 3.7.2, which gives the regression of food expenditure on total 
expenditure for 55 rural households in India. Since under the normality assumption the OLS and ML estimators of the 
regression coefficients are the same. we obtain the ML estimators as Bi = Êi = 94.2087 and Bo = bo = 0.4368. The OLS 
estimator of o% is 6° = 4469.6913, but the ML estimator is ¢? = 4407.1563, which is smaller than the OLS estimator. 
As noted, in small samples the ML estimator is downward biased: that is, on average it underestimates the true variance 
a”. Of course, as you would expect. as the sample size gets bigger, the difference between the two estimators will narrow. 
Putting the values of the estimators in the log likelihood function, we obtain the value of —308.1625. If you want the 
maximum value of the LF. just take the antilog of —308.1625. No other values of the parameters will give you a higher 
probability of obtaining the sample that you have used in the analysis. 


Appendix 4A Exercises 


4.1. “If two random variables are statistically independent, the coefficient of correlation between the two is zero. But 
the converse is not necessarily true; that is, zero correlation does not imply statistical independence. However, if 
two variables are normally distributed, zero correlation necessarily implies statistical independence.” Verify this 
statement for the following joint probability density function of two normally distributed variables Y, and Y, (this 
joint probability density function is known as the bivariate normal probability density function): 


1 


1 
270102,/1 — p? | A= A) 


a ( = ay _ 3% = jti)(%> — 2) (7 =) || 
0} 0102 o 


IOT OS 


where u; = mean of Y, 
H = mean of Y, 
g, = standard deviation of Y, 
©, = standard deviation of Y, 
p = coefficient of correlation between Y, and Y, 
4.2. By applying the second-order conditions for optimization (i.€., second-derivative test), show that the ML estimators 
of B,, B>, and g? obtained by solving Egs. (9), (10), and (11) do in fact maximize the likelihood function in Eq. (4). 


4See Appendix A for a general discussion of the properties of the maximum likelihood estimators as well as for the distinc- 
tion between asymptotic unbiasedness and consistency. Roughly speaking, in asymptotic unbiasedness we try to find out 
the lim £(62) as n tends to infinity, where n is the sample size on which the estimator is based, whereas in consistency we 
try to find out how 6? behaves as n increases indefinitely. Notice that the unbiasedness property is a repeated sampling 
property of an estimator based on a sample of given size, whereas in consistency we are concerned with the behavior of an 
estimator as the sample size increases indefinitely. 
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4.3. 


4.4. 
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A random variable X follows the exponential distribution if it has the following probability density function 
(PDF): 
AX =(1/0)e*/? ~— for ¥ > 0 
=0 elsewhere 


where 0 > 0 is the parameter of the distribution. Using the ML method, show that the ML estimator of 0 is 
62 $ X;i/n, where n is the sample size. That is, show that the ML estimator of 0 is the sample mean 3@ 
Suppose that the outcome of an experiment is classified as either a success or a failure. Letting X = 1 when the 
outcome is a success and X = 0 when it is a failure, the probability density, or mass, function of X is given by 


P(X =0)=1-p 
PX =1)=p,0<p<l 


What is the maximum likelihood estimator of p, the probability of success? 


CHAPTER 


Two-Variable Regression: 
Interval Estimation and 
Hypothesis Testing 


Beware of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but 
confession obtained under duress may not be admissible in the court of scientific opinion.’ 


As pointed out in Chapter 4, estimation and hypothesis testing constitute the two major branches of classical 
statistics. The theory of estimation consists of two parts: point estimation and interval estimation. We have 
discussed point estimation thoroughly in the previous two chapters where we introduced the OLS and ML 
methods of point estimation. In this chapter we first consider interval estimation and then take up the topic of 
hypothesis testing, a topic intimately related to interval estimation. 


5.1 Statistical Prerequisites 


Before we demonstrate the actual mechanics of establishing confidence intervals and testing statistical 
hypotheses, it is assumed that the reader is familiar with the fundamental concepts of probability and statistics. 
Although not a substitute for a basic course in statistics, Appendix A provides the essentials of statistics with 
which the reader should be totally familiar. Key concepts such as probability, probability distributions, 
Type I and Type II errors, level of significance, power of a statistical test, and confidence interval are 
crucial for understanding the material covered in this and the following chapters. 


5.2 Interval Estimation: Some Basic Ideas 


To fix the ideas, consider the wages-education example of Chapter 3. Equation (3.6.1) shows that the estimated 
average increase in mean hourly wage related to a one-year increase in schooling (82) is 0.7240, which is a 


‘Stephen M. Stigler, “Testing Hypothesis or Fitting Models? Another Look at Mass Extinctions,” in Matthew H. Nitecki and 
Antoni Hoffman, eds., Neutral Models in Biology, Oxford University Press, Oxford, 1987, p. 148. 
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one number (point) estimate of the unknown population value B,. How reliable is this estimate? As noted in 
Chapter 3, because of sampling fluctuations, a single estimate is likely to differ from the true value, although 
in repeated sampling its mean value is expected to be equal to the true value. [Note: E(B2) = B2.] Now in 
statistics, the reliability of a point estimator is measured by its standard error. Therefore, instead of relying 
on the point estimate alone, we may construct an interval around the point estimator, say within two or three 
standard errors on either side of the point estimator, such that this interval has, say, 95 percent probability of 
including the true parameter value. This is roughly the idea behind interval estimation. 

To be more specific, assume that we want to find out how “close,” say, bo is to B,. For this purpose we try 
to find out two positive numbers 6 and a, the latter lying between 0 and 1, such that the probability that the 
random interval (f, — ô, Ê» + 5) contains the true B, is 1 — a. Symbolically, 


Pr (Êz —8 < fo < fo +8) =1-a (5.2.1) 


Such an interval, if it exists, is known as a confidence interval; 1 — a is known as the confidence coefficient; 
and a (0 < a < 1) is known as the level of significance.” The endpoints of the confidence interval are known 
as the confidence limits (also known as critical values), By — ô being the lower confidence limit and f3 + ô 
the upper confidence limit. In passing, note that in practice œ and 1 — «œ are often expressed in percentage 
forms as 100q@ and 100(1 — æ) percent. 

Equation 5.2.1 shows that an interval estimator, in contrast to a point estimator, is an interval constructed 
in such a manner that it has a specified probability 1 — a of including within its limits the true value of the 
parameter. For example, if a = 0.05, or 5 percent, Eq. (5.2.1) would read: The probability that the (random) 
interval shown there includes the true B, is 0.95, or 95 percent. The interval estimator thus gives a range of 
values within which the true B, may lie. 

It is very important to know the following aspects of interval estimation: 

1. Equation (5.2.1) does not say that the probability of 6, lying between the given limits is 1 — æ. Since f», 
although an unknown, is assumed to be some fixed number, either it lies in the interval or it does not. What 
Eq. (5.2.1) states is that, for the method described in this chapter, the probability of constructing an interval 
that contains B, 1s 1 — a. 

2. The interval in Eq. (5.2.1) is a random interval; that is, it will vary from one sample to the next because 
it is based on ĝ,, which is random. (Why?) 

3. Since the confidence interval is random. the probability statements attached to it should be understood 
in the long-run sense, that is, repeated sampling. More specifically, Eq. (5.2.1) means: If in repeated sampling 
confidence intervals like it are constructed a great many times on the 1 — æ probability basis. ‘then, in the long 
run, on the average, such intervals will enclose in 1 — «æ of the cases the true value of the parameter. 

4. As noted in (2), the interval in Eq. (5.2.1) is random so long as £, is not known. But once we have a 
specific sample and once we obtain a specific numerical value of 5. the interval in Eq. (5.2.1) is no longer 
random; it is fixed. In this case, we cannot make the probabilistic statement in Eq. (5.2.1); that is. we cannot 
say that the probability is 1 — a that a given fixed interval includes the true 6. In this situation, B, is either 
in the fixed interval or outside it. Therefore, the probability is either 1 or 0. Thus, for our wages-education 
example, if the 95 percent confidence interval were obtained as (0.5700 = B, = 0.8780), as we do shortly in 
Eq. (5.3.9), we cannot say the probability is 95 percent that this interval includes the true B,. That probability 
is either 1 or 0. 

How are the confidence intervals constructed? From the preceding discussion one may expect that if 
the sampling or probability distributions of the estimators are known, one can make confidence interval 


7Also known as the probability of committing a Type I error. A Type | error consists in rejecting a true hypothesis, 


whereas a Type Il error consists in accepting a false hypothesis. (This topic is discussed more fully in Appendix A.) The 
symbol ais also known as the size of the (statistical) test. 
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statements such as Eq. (5.2.1). In Chi apter + we saw that under the assumption of normality of the distur- 
bances u, the OLS estimators ĝi and p> are themselves normally distributed and that the OLS estimator a? 
is related to the x” (chi-square) distribution. It would then seem that the task of constructing confidence 
intervals is a simple one. And it is! 


5.3 Confidence Intervals for Regression Coefficients 3, and £, 


Confidence Interval for 6, 


It was shown in Chapter 4, Section 4.3, that, with the normality assumption for u, the OLS estimators 
ßı and £2 are themselves normally distributed with means and variances given therein. Therefore, for 
example, the variable 


_Ê-hk 
~ se(ĝ») 
A (5.3.1) 
(B2 -BWE x? 
= o 


as noted in Eq. (4.3.6), is a standardized normal variable. It therefore seems that we can use the normal 
distribution to make probabilistic statements about 6, provided the true population variance a" is known. If 
a” is known, an important property of a normally distributed variable with mean u and variance ø? is that the 
area under the normal curve between u + ø is about 68 percent, that between the limits u + 20 is about 95 
percent, and that between u + 3a is about 99.7 percent. 

But o` is rarely known, and in practice it is determined by the unbiased estimator G*. If we replace o by 
g, Equation 5.3.1 may be written as 


Bo — Br _ Estimator — Parameter 
se ( ĝ) — Estimated standard error of estimator 


(5.3.2) 

(Bo — B) 0%? 

a 
where the se (B> ) now refers to the estimated standard error. It can be shown (see Appendix 5A, Section 
5A.2) that the ż variable thus defined follows the rf distribution with n — 2 df. [Note the difference between 
Eqs. (5.3.1) and (5.3.2).] Therefore, instead of using the normal distribution, we can use the + distribution to 

establish a confidence interval for B, as follows: 

Pr (taj SLZ ta) =1-—a (5.3.3) 


where the ż value in the middle of this double inequality is the t value given by Equation 5.3.2 and where 1, is 
the value of the ¢ variable obtained from the / distribution for @/2 level of significance and n — 2 df; it is often 
called the critical ¢ value at a/2 level of significance. Substitution of Eq. (5.3.2) into Equation 5.3.3 yields 


Pr tr < b ar < on =l-a (5.3.4) 
S 
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Rearranging Equation 5.3.4, we obtain 


Pr [Êz — ta2 Se (Bx) < Po < Bo + tap se (fo) =1—a (5.3.5)° 


Equation 5.3.5 provides a 100(1 — œ) percent confidence interval for £, which can be written more 
compactly as 


100(1 — a)% confidence interval for B2: 


po + tan se (Bo) ) (5.3.6) 


Arguing analogously, and using Eqs. (4.3.1) and (4.3.2), we can then write: 


Pr [Bi — ta/2 se (Bi) < Bi < Bi + tap se (Ê)] = 1 — æ (5.3.7) 
or, more compactly, 
100(1 — w)% confidence interval for f1: 


Bi + taj2 se (B1) (5.3.8) 


Notice an important feature of the confidence intervals given in Equations 5.3.6 and 5.3.8: In both cases 
the width of the confidence interval is proportional to the standard error of the estimator. That is, the larger 
the standard error, the larger is the width of the confidence interval. Put differently. the larger the standard 
error of the estimator, the greater is the uncertainty of estimating the true value of the unknown parameter. 
Thus, the standard error of an estimator is often described as a measure of the precision of the estimator (1.e., 
how precisely the estimator measures the true population value). 

Returning to our regression example in Chapter 3 (Section 3.6) of mean hourly wages (Y) on education 
(X), recall that we found in Table 3.2 that ĝ, = 0.7240; se (B)) = 0.0700. Since there are 13 observations, 
the degrees of freedom (df) are 11. If we assume a = 5%, that is, a 95% confidence coefficient, then the t table 
shows that for 11 df the critical ż, = 2.201. Substituting these values in Eq. (5.3.5), the reader should verify 
that the 95 percent confidence interval for B, is as follows:4 


0.5700 < B2 < 0.8780 . (5.3.9) 
Or, using Eq. (5.3.6), it is Š 
0.7240 + 2.201(0.0700) 
that is, 
0.7240 + 0.1540 (5.3.10) 


The interpretation of this confidence interval is: Given the confidence coefficient of 95 percent, in 
95 out of 100 cases intervals like Equation 5.3.9 will contain the true 8.. But, as warned earlier, we cannot 
say that the probability is 95 percent that the specific interval in Eq. (5.3.9) contains the true B, because this 
interval is now fixed and no longer random: therefore 8, either lies in it or it does not: The probability that the 
specified fixed interval includes the true B, is therefore 1 or 0. 


3Some authors prefer to write Eq. (5.3.5) with the df explicitly indicated. Thus, they would write 
Pr [B2 — tn-2),0/2 S€ (Ê2) < B2 < Bo + tn-2)a/2 se (Ê2)] = 1 — a 
But for simplicity we will stick to our notation; the context clarifies the appropriate df involved. 


‘Because of rounding errors in Table 3.2, the answers given below may not exactly match the answers obtained from a 
Statistical package. 
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Following Eq. (5.3.7), and the data in Table 3.2, the reader can easily verify that the 95 percent confidence 
interval for B, for our example is 


—1.8871 < B, < 1.8583 (5.3.11) 


Again you should be careful in interpreting this confidence interval. In 95 out of 100 cases, intervals like 
Equation 5.3.11 will contain the true B,; the probability that this particular fixed interval includes the true B, 
is either 1 or 0. 


Confidence Interval for 8, and 8, Simultaneously 


There are occasions when one needs to construct a joint confidence interval for B, and B, such that with a 
confidence coefficient (1 — a), say, 95 percent, that interval includes B, and B, simultaneously. Since this 
topic is involved, the interested reader may want to consult appropriate references.° We will touch on this 
topic briefly in Chapters 8 and 10. 


5.4 Confidence Interval for o? 


As pointed out in Chapter 4, Section 4.3, under the normality assumption, the variable 
ee 
= (n — 25 - (5.4.1) 


follows the y` distribution with n — 2 df.° Therefore, we can use the y? distribution to establish a confidence 
interval for o? 


PE a = = y< < Xn) =l-a (5.4.2) 


iere the x? value in the = of this enue inequality is as given by Equation 5.4.1 and where x7_, 12 
and x2 j2 are two values of x’ (the critical y? values) nen from the chi-square table for n — 2 df in such 
a manner that they cut off 100(a/2) percent tail areas of the x° distribution, as shown in Figure 5.1. 


f(x?) 


Density 


x2 


3.8157 21.9200 
2 2 
Xo.975 Xo.025 
Figure 5.1 The 95% confidence interval for x7 (11 df). 


5For an accessible discussion, see John Neter, William Wasserman, and Michael H. Kutner, Applied Linear Regression Models, 
Richard D. Irwin, Homewood, III., 1983, Chap. 5. 

®For proof, see Robert V. Hogg and Allen T. Craig, Introduction to Mathematical Statistics, 2d ed., Macmillan, New York, 
1965, p. 144. 
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Substituting y? from Eq. (5.4.1) into Equation 5.4.2 and rearranging the terms, we obtain 


ô? ô? 
Pr| (n —2)—— <o <(n—-2)-5 =l-—a (5.4.3) 
Xa/2 Xi—a/2 


which gives the 100(1 — a)% confidence interval for T”. 

Continuing with our wages-education example, we found in Table 3.2 that for our data we have 
6? = 0.8936. If we choose a of 5%, the chi-square table for 11 df gives the following critical values: x6 995 = 
21.9200, and Xe oi = 3.8157. These values show that the probability of a chi-square value exceeding 
21.9200 is 2.5 percent and that of 3.8157 is 97.5 percent. Therefore, the interval between these two values 
is the 95 percent confidence interval for x’, as shown in Figure 5.1. (Note the skewed characteristic of the 
chi-square distribution.) 

Substituting the data of our example into Eq. (5.4.3), the reader can verify that the 95 percent confidence 
interval for ø? is as follows: 


0.4484 < o? < 2.5760 ` (5.4.4) 


The interpretation of this interval is: If we establish 95 percent confidence limits on o° and if we 
maintain a priori that these limits will include the true a”, we will be right in the long run 95 percent of the 
time. 


5.5 Hypothesis Testing: General Comments 


Having discussed the problem of point and interval estimation, we shall now consider the topic of hypothesis 
testing. In this section we discuss briefly some general aspects of this topic. Appendix A gives some additional 
details. 

The problem of statistical hypothesis testing may be stated simply as follows: /s a given observation 
or finding compatible with some stated hypothesis or not? The word “compatible.” as used here. means 
“sufficiently” close to the hypothesized value so that we do not reject the stated hypothesis. Thus. if some 
theory or prior experience leads us to believe that the true slope coefficient 8, of the wages-education 
example is unity, is the observed Bo = 0.724 obtained from the sample of Table 3.2 consistent with the stated 
hypothesis? If it is, we do not reject the hypothesis; otherwise, we may reject it. 

In the language of statistics, the stated hypothesis is known as the null hypothesis ard is denoted by 
the symbol Hp. The null hypothesis is usually tested against an alternative hypothesis (also known as 
maintained hypothesis) denoted by H,, which may state, for example, that true B, is different from unity. 
The alternative hypothesis may be simple or composite.’ For example, H}: B» = 1.5 is a simple hypothesis. 
but H,: B, # 1.5 is a composite hypothesis. 

The theory of hypothesis testing is concerned with developing rules or procedures for deciding whether 
to reject or not reject the null hypothesis. There are two mutually complementary approaches for devising 
such rules, namely, confidence interval and test of significance. Both these approaches predicate that the 
variable (statistic or estimator) under consideration has some probability distribution and that hypothesis 
testing involves making statements or assertions about the value(s) of the parameter(s) of such distribution. 
For example, we know that with the normality assumption Bo is normally distributed with mean equal to B, 


7A statistical hypothesis is called a simple hypothesis if it specifies the precise value(s) of the parameter(s) of a 
probability density function; otherwise, it is called a composite hypothesis. For example, in the normal pdf (1/0/27) 
exp {—5[(X — u)/o}*), if we assert that H,: u = 15 and ø = 2, it is a simple hypothesis; but if Hya = 15 anda > 15, itisa 
composite hypothesis, because the standard deviation does not have a specific value. 
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and variance given by Eq. (4.3.5). If we hypothesize that B, = 1, we are making an assertion about one of the 
parameters of the normal distribution, namely, the mean. Most of the statistical hypotheses encountered in 
this text will be of this type—making assertions about one or more values of the parameters of some assumed 
probability distribution such as the normal, F, r, or y`. How this is accomplished is discussed in the following 
two sections. 


5.6 Hypothesis Testing: The Confidence-Interval Approach 
Two-Sided or Two-Tail Test 


To illustrate the confidence interval approach, once again we revert to our wages-education example. From 
the regression results given in Eq. (3.6.1). we know that the slope coefficient is 0.7240. Suppose we postulate 
that 


Ho: Bz = 0.5 
Ay: Bo me 0.5 


that is, the true slope coefficient is 0.5 under the null hypothesis but less than or greater than 0.5 under 
the alternative hypothesis. The null hypothesis is a simple hypothesis. whereas the alternative hypothesis is 
composite: actually it is what is known as a two-sided hypothesis. Very often such a two-sided alternative 
hypothesis reflects the fact that we do not have a strong a priori or theoretical expectation about the direction 
in which the alternative hypothesis should move from the null hypothesis. 

Is the observed B, compatible with Ho? To answer this question, let us refer to the confidence interval 
in Eq. (5.3.9). We know that in the long run intervals like (0.5700, 0.8780) will contain the true B, with 95 
percent probability. Consequently, in the long run (i.e., repeated sampling) such intervals provide a range or 
limits within which the true 8, may lie with a confidence coefficient of, say, 95 percent. Thus, the confidence 
interval provides a set of plausible null hypotheses. Therefore, if B» under Hy falls within the 100(1 - a)% 
confidence interval, we do not reject the null hypothesis; if it lies outside the interval, we may reject it.® This 
range is illustrated schematically in Figure 5.2. 


Values of £, lying in this interval are 
plausible under H} with 100(1 - @)% 
confidence. Hence, do not reject 

H, if P, lies in this region. 


6 a 


B,- taz se(B,) Ê, + ia se(ĝ2) 
Figure 5.2 A 100(1 — a)% confidence interval for B. 
8always bear in mind that there is a 100a percent chance that the confidence interval does not contain B, under Hy even 


though the hypothesis is correct. In short, there is a 100a percent chance of committing a Type I error. Thus, if a = 0.05, 
there is a 5 percent chance that we could reject the null hypothesis even though it is true. 
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Decision Rule Construct a 100(1 — a)% confidence interval for 82. If the B, under Hp falls within this 
confidence interval, do not reject Ho, but if it falls outside this interval, reject Ho. 


Following this rule, for our hypothetical example, Hp:8, = 0.5 clearly lies outside the 95 percent 
confidence interval given in Eq. (5.3.9). Therefore, we can reject the hypothesis that the true slope is 0.5, with 
95 percent confidence. If the null hypothesis were true, the probability of our obtaining a value of slope of as 
much as 0.7240 by sheer chance or fluke is at the most about 5 percent, a small probability. 

In statistics, when we reject the null hypothesis, we say that our finding is statistically significant. On the 
other hand, when we do not reject the null hypothesis, we say that our finding is not statistically significant. 

Some authors use a phrase such as “highly statistically significant.” By this they usually mean that when 
they reject the null hypothesis, the probability of committing a Type I error (i.e., œ) is a small number, usually 
| percent. But as our discussion of the p value in Section 5.8 will show, it is better to leave it to the researcher 


99 cc 


to decide whether a statistical finding is “significant,” “moderately significant,” or “highly significant.” 


One-Sided or One-Tail Test 


Sometimes we have a strong a priori or theoretical expectation (or expectations based on some previous 
empirical work) that the alternative hypothesis is one-sided or unidirectional rather than two-sided, as just 
discussed. Thus, for our wages-education example, one could postulate that 


Ho: Bo < 0.5 and Hı: Bo > 0.5 


Perhaps economic theory or prior empirical work suggests that the slope is greater than 0.5. Although the 
procedure to test this hypothesis can be easily derived from Eq. (5.3.5), the actual mechanics are better 
explained in terms of the test-of-significance approach discussed next.” 


5.7 Hypothesis Testing: The Test-of-Significance Approach 


Testing the Significance of Regression Coefficients: The t Test 


An alternative but complementary approach to the confidence-interval method of testing statistical hypotheses 
is the test-of-significance approach developed along independent lines by R. A. Fisher and jointly by 
Neyman and Pearson.'° Broadly speaking, a test of significance is a procedure by which sample results 
are used to verify the truth or falsity of a null hypothesis. The key idea behind tests of si gnificance is that 
of a test statistic (estimator) and the sampling distribution of such a statistic under the null hypothesis. The 
decision to accept or reject Hy is made on the basis of the value of the test statistic obtained from the data at 
hand. 
As an illustration, recall that under the normality assumption the variable 


a Bo — Bo 
se (f2) 
7 (Bo — Bo) ye xe 


o 


t 


(5.3.2) 


If you want to use the confidence interval approach, construct a (100 — a)% one-sided or one-tail confidence interval for 
B2. Why? 


"Details may be found in E. L. Lehman, Testing Statistical Hypotheses, John Wiley & Sons, New York, 1959. 
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follows the ¢ distribution with n — 2 df. If the value of true B, is specified under the null hypothesis, the z value 
of Eq. (5.3.2) can readily be computed from the available sample, and therefore it can serve as a test statistic. 
And since this test statistic follows the r distribution, confidence-interval statements such as the following 
can be made: 


Pr |- em an| =1—ø (5.7.1) 


where f is the value of 8, under Hy and where —t „p and tp are the values of ż (the critical t values) obtained 
from the rf table for (a@/2) level of significance and n — 2 df [cf. Eq. (5.3.4)]. The t table is given in Appendix D. 
Rearranging Equation 5.7.1, we obtain 


Pr [B3 — tap se (Bo) < Bo < Be + tan se (Ê)] = 1 —a (5.7.2) 


which gives the interval in which P2 will fall with 1 - a probability, given B2 = By. In the language of 
hypothesis testing, the 100(1 — a)% confidence interval established in Equation 5.7.2 is known as the region 
of acceptance (of the null hypothesis) and the region(s) outside the confidence interval is (are) called the 
region(s) of rejection (of H)) or the critical region(s). As noted previously, the confidence limits, the 
endpoints of the confidence interval, are also called critical values. 

The intimate connection between the confidence-interval and test-of-significance approaches to hypothesis 
testing can now be seen by comparing Eq. (5.3.5) with Eq. (5.7.2). In the confidence-interval procedure we 
try to establish a range or an interval that has a certain probability of including the true but unknown $», 
whereas in the test-of-significance approach we hypothesize some value for B, and try to see whether the 
computed B> lies within reasonable (confidence) limits around the hypothesized value. 

Once again let us return to our wages-education example. We know that po = 0.7240, se ( po) = 0.0700, 
and df = 11. If we assume a = 5%, t,,) = 2.201. 

If we assume Ho: f2 = p3 = 0.5 and H,: B, # 0.5, Eq. (5.7.2) becomes 


Pr (0.3460 < p> < 0.6540) (5.7.3) 


as shown diagrammatically in Figure 5.3. 

In practice, there is no need to estimate Eq. (5.7.2) explicitly. One can compute the ż value in the middle 
of the double inequality given by Eq. (5.7.1) and see whether it lies between the critical ż values or outside 
them. For our example, 


DUU ali es (5.7.4) 
0.0700 


which clearly lies in the critical region of Figure 5.4. The conclusion remains the same; namely, we reject 
Hp. 


"ir Sec. 5.2, point 4, it was stated that we cannot say that the probability is 95 percent that the fixed interval (0.5700, 
0.8780) includes the true 8. But we can make the probabilistic statement given in Eq. (5.7.3) because Bo, being an 
estimator, is a random variable. 
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io 

D 
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a Êz = 0.7240 
Critical lies in this 
region : critical region 
2.5% 4 P 25% 


= 
0.3460 0.5 0.6540 


Figure 5.3 The 95% confidence interval for Ê, under the hypothesis that 8, = 0.5. 


fo) 
p 
E 95% t=3.2 
A Critical Region of lies in this 
region acceptance critical region 
2.5% 2.5% 


z2. 2 0 +2.201 
Figure 5.4 The 95% confidence interval for ¢(11 df). 


Notice that if the estimated £, (= ĝ2) is equal to the hypothesized B,, the ¢ value in Equation 5.7.4 will 
be zero. However, as the estimated 8, value departs from the hypothesized B, value, Iri (that is, the absolute 
t value; note: t can be positive as well as negative) will be increasingly large. Therefore, av“ large” \tl value 
will be evidence against the null hypothesis. Of course, we can always use the f table to determine whether a 
particular ¢ value is large or small; the answer, as we know, depends on the degrees of freedom as well as on 
the probability of Type I error that we are willing to accept. If you take a look at the r table given in Appendix 
D (Table D.2), you will observe that for any given value of df the probability of obtaining an increasingly 
large It| value becomes progressively smaller. Thus, for 20 df the probability of obtaining a Irl value of 1.725 
or greater is 0.10 or 10 percent, but for the same df the probability of obtaining a Irl value of 3.552 or greater 
is only 0.002 or 0.2 percent. l 

Since we use the ż distribution, the preceding testing procedure is called appropriately the ¢ test. In the 
language of significance tests, a statistic is said to be statistically significant if the value of the test 
statistic lies in the critical region. In this case the null hypothesis is rejected. By the same token, a test 
is said to be statistically insignificant if the value of the test statistic lies in the acceptance region. In this 
situation, the null hypothesis is not rejected. In our example, the f test is significant and hence we reject the 
null hypothesis. 
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Before concluding our discussion of hypothesis testing, note that the testing procedure just outlined is 
known as a two-sided, or two-tail, test-of-significance procedure in that we consider the two extreme tails of 
the relevant probability distribution, the rejection regions, and reject the null hypothesis if it lies in either tail. 
But this happens because our H, was a two-sided composite hypothesis; 8, # 0.5 means £; is either greater 
than or less than 0.5. But suppose prior experience suggests to us that the slope is expected to be greater than 
0.5. In this case we have: Hp: B, = 0.5 and H;: B» > 0.5. Although H} is still a composite hypothesis, it is now 
one-sided. To test this hypothesis, we use the one-tail test (the right tail), as shown in Figure 5.5. (See also 
the discussion in Section 5.6.) 

The test procedure is the same as before except that the upper confidence limit or critical value now corre- 
sponds to f, = t 95. that is, the 5 percent level. As Figure 5.5 shows, we need not consider the lower tail of the 
t distribution in this case. Whether one uses a two- or one-tail test of significance will depend upon how the 
alternative hypothesis is formulated, which, in turn, may depend upon some a priori considerations or prior 
empirical experience. (But more on this in Section 5.8.) 

We can summarize the ż test of significance approach to hypothesis testing as shown in Table 5.1. 


FB2) 

N 

z| 95% R 

5 Region of B2 = 0.7240 

a acceptance lies in this 
critical region 
2.5% 


B, 
0.5 0.6257 
|e 
s [63 + 1.796 se(B>)] 
fit) á 
l 
l 
l 
| 
| 
2 | 
A 95% | 
5 Region of | 
z a | t=3.2 
i aiid | lies in this 
| critical region 
5% 


Figure 5.5 One-tail test of significance. 
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Table 5.1 The / Test of Significance: Decision Rules 


Type of Ho: The Null Hı: The Alternative . Decision Rule: 
Hypothesis Hypothesis Hypothesis Reject Hp If 
Two-tail B2 = B2 B2 # B2 It] > ta/2,df 
Right-tail Bo < B2 Bo > B2 t > baat 

p2 > f p2 < B2 t < >tadf ` 


Left-tail 


Notes: P3 is the hypothesized numerical value of+52. 

|t| means the absolute value of t. 

ty OF t¢/2 means the critical ż value at the œ or a/2 level of significance. 

df; degrees of freedom, (n — 2) for the two-variable model, (n — 3) for the three-variable model, and so on. 
The same procedure holds to test hypotheses about £4. 


Testing the Significance of o°: The x? Test 


As another illustration of the test-of-significance methodology, consider the following variable: 


2 ô? 
x =(n-2)-—> (5.4.1) 
(of 


which, as noted previously, follows the x distribution with n — 2 df. For our example, 6? = 0.8937 and df 
= 11. If we postulate that Hy:o7 = 0.6 versus H,:0? # 0.6, Equation 5.4.1 provides the test statistic for Hp. 
Substituting the appropriate values in Eq. (5.4.1), it can be found that under Hp, X~ = 16.3845. If we assume 
a = 5%, the critical y? values are 3.81575 and 21.9200. Since the computed y? lies between these limits, the 
data support the null hypothesis and we do not reject it. (See Figure 5.1.) This test procedure is called the 
chi-square test of significance. The y? test of significance approach to hypothesis testing is summarized in 
Table 5.2. 


Table 5.2 A Summary of the x° Test 


Ho: The Null H,: The Alternative Critical Region: 
Hypothesis Hypothesis Reject Ho If ~x 
a2 
Bai? ona He > 32 
00 
df(é 2) 
o2 = 0% o2 <0 aa > ea XA —a),df 
00 
df(é7) _ 5 
o2=0% o? 40% oF > Xal2,df 


2 
OFX (ee pelt 


Note: o is the value of o? under the null hypothesis. The first subscript on Xx% in the last column is the level of significance, and 
the second subscript is the degrees of freedom. These are critical chi-square values. Note that df is (n — 2) for the two-variable 
regression model, (7 — 3) for the three-variable regression model, and so on. 
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5.8 Hypothesis Testing: Some Practical Aspects 


The Meaning of “Accepting” or “Rejecting” a Hypothesis 


If, on the basis of a test of significance, say, the t test, we decide to “accept” the null hypothesis, all we are 
saying is that on the basis of the sample evidence we have no reason to reject it; we are not saying that the 
null hypothesis is true beyond any doubt. Why? To answer this, let us return to our wages-education example 
and assume that H,:8, = 0.70. Now the estimated value of the slope is b- = 0.7241 with a se (b2) = 0.0701. 
(0.7241 — 0.7) 
0.0701 
Therefore, we say “accept” Ho. But now let us assume Hy:8, = 0.6. Applying the ¢ test again, we obtain 
(0.7241 — 0.6) 
~ 0.0701 i 
Which of these two null hypotheses is the “truth? We do not know. Therefore, in “accepting” a null hypothesis 
we should always be aware that another null hypothesis may be equally compatible with the data. It is 
therefore preferable to say that we may accept the null hypothesis rather than we (do) accept it. Better still, 


Then on the basis of the r test we find that t = = 0.3438, which is insignificant, say, at a= 5%. 


= 1.7703, which is also statistically insignificant. So now we say “accept” this Hp. 


. . . just as a court pronounces a verdict as “not guilty” rather than “innocent,” so the conclusion of a statistical test 
is “do not reject” rather than “accept.” !* 


The “Zero” Null Hypothesis and the “2-t” Rule of Thumb 


A null hypothesis that is commonly tested in empirical work is Hp:B, = 0, that is, the slope coefficient is zero. 
This “zero” null hypothesis is a kind of straw man, the objective being to find out whether Y is related at all to 
X, the explanatory variable. If there is no relationship between Y and X to begin with, then testing a hypothesis 
such as 6, = 0.3 or any other value is meaningless. 

This null hypothesis can be easily tested by the confidence interval or the t-test approach discussed in 
the preceding sections. But very often such formal testing can be shortcut by adopting the “2-7” rule of 
significance, which may be stated as 


**2-f’ Rule of Thumb If the number of degrees of freedom is 20 or more and if a, the level of significance, 
is set at 0.05, then the null hypothesis 8, = O can be rejected if the t value [ = £2/se (62)] computed from 
Eq. (5.3.2) exceeds 2 in absolute value. 


The rationale for this rule is not too difficult to grasp. From Eq. (5.7.1) we know that we will reject 
Hy: B,= 0 if 
t= b> /se (Bo) > fa/2 when bo >0 


or 
t = B,/se(B.) < —ta when Bp < 0 
or when 
It] = aha > ta/2 (5.8.1) 
se (f2) 


for the appropriate degrees of freedom. 


12Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, p. 114. 
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Now if we examine the r table given in Appendix D, we see that for df of about 20 or more a computed 
t value in excess of 2 (in absolute terms), say, 2.1, is statistically significant at the 5 percent level, implying 
rejection of the null hypothesis. Therefore, if we find that for 20 or more df the computed t value is, say, 2.5 
or 3, we do not even have to refer to the f table to assess the significance of the estimated slope coefficient. Of 
course, one can always refer to the ¢ table to obtain the precise level of significance, and one should always 
do so when the df are fewer than, say, 20. 

In passing, note that if we are testing the one-sided hypothesis 8, = 0 versus B, > 0 or Bə < 0, then we 
should reject the null hypothesis if 


A 


Bo 
se (2) 


If we fix a at 0.05, then from the ¢ table we observe that for 20 or more df at value in excess of 1.73 is 
statistically significant at the 5 percent level of significance (one-tail). Hence, whenever a t value exceeds, 
say, 1.8 (in absolute terms) and the df are 20 or more, one need not consult the r table for the statistical signifi- 
cance of the observed coefficient. Of course, if we choose a at 0.01 or any other level. we will have to decide 
on the appropriate ¢ value as the benchmark value. But by now the reader should be able to do that. 


t| = > by (5.8.2) 


Forming the Null and Alternative Hypotheses’? 


Given the null and the alternative hypotheses, testing them for statistical significance should no longer 
be a mystery. But how does one formulate these hypotheses? There are no hard-and-fast rules. Very often 
the phenomenon under study will suggest the nature of the null and alternative hypotheses. For example, 
consider the capital market line (CML) of portfolio theory, which postulates that Æ; = B,+ 6.0, where 
E = expected return on portfolio and ø = the standard deviation of return, a measure of risk. Since return and 
risk are expected to be positively related—the higher the risk, the higher the return—the natural alternative 
hypothesis to the null hypothesis that 8, = 0 would be B, > 0. That is, one would not choose to consider values 
of B, less than zero. 

But consider the case of the demand for money. As we shall show later, one of the important determi- 
nants of the demand for money is income. Prior studies of the money demand functions have shown that the 
income elasticity of demand for money (the percent change in the demand for money for a 1 percent change 
in income) has typically ranged between 0.7 and 1.3. Therefore, in a new study of demand for money. if one 
postulates that the income-elasticity coefficient B, is 1, the alternative hypothesis could be that B, # 1, a 
two-sided alternative hypothesis. 

Thus, theoretical expectations or prior empirical work or both can be relied upon to formulate hypotheses. 
But no matter how the hypotheses are formed, it is extremely important that the researcher establish these 
hypotheses before carrying out the empirical investigation. Otherwise, he or she will be guilty of circular 
reasoning or self-fulfilling prophesies. That is, if one were to formulate hypotheses after examining the 
empirical results, there may be the temptation to form hypotheses that justify one’s results. Such a practice 
should be avoided at all costs, at least for the sake of scientific objectivity. Keep in mind the Stigler quotation 
given at the beginning of this chapter! 


"For an interesting discussion about formulating hypotheses, see J. Bradford De Long and Kevin Lang, “Are All Economic 
Hypotheses False?” Journal of Political Economy, vol. 100, no. 6, 1992, pp. 1257-1272. 
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Choosing a, the Level of Significance 


It should be clear from the discussion so far that whether we reject or do not reject the null hypothesis depends 
critically on æ, the level of significance or the probability of committing a Type I error—the probability of 
rejecting the true hypothesis. In Appendix A we discuss fully the nature of a Type I error, its relationship 
to a Type H error (the probability of accepting the false hypothesis) and why classical statistics generally 
concentrates on a Type I error. But even then, why is a commonly fixed at the 1, 5, or, at the most, 10 percent 
levels? As a matter of fact, there is nothing sacrosanct about these values; any other values will do just as well. 

In an introductory book like this it is not possible to discuss in depth why one chooses the 1, 5, or 10 
percent levels of significance, for that will take us into the field of statistical decision making, a discipline 
unto itself. A brief summary, however. can be offered. As we discuss in Appendix A, for a given sample size, 
if we try to reduce a Type l error, a Type II error increases, and vice versa. That is, given the sample size, if 
we try to reduce the probability of rejecting the true hypothesis, we at the same time increase the probability 
of accepting the false hypothesis. So there is a trade-off involved between these two types of errors, given the 
sample size. Now the only way we can decide about the trade-off is to find out the relative costs of the two 
types of errors. Then, 


If the error of rejecting the null hypothesis which is in fact true (Error Type I) is costly relative to the error of not 
rejecting the null hypothesis which is in fact false (Error Type IJ), it will be rational to set the probability of the 
first kind of error low. If, on the other hand, the cost of making Error Type I is low relative to the cost of making 
Error Type II. it will pay to make the probability of the first kind of error high (thus making the probability of the 
second type of error low).!* 


Of course, the rub is that we rarely know the costs of making the two types of errors. Thus, applied econome- 
tricians generally follow the practice of setting the value of a at a 1 or a 5 or at most a 10 percent level and 
choose a test statistic that would make the probability of committing a Type II error as small as possible. Since 
one minus the probability of committing a Type II error is known as the power of the test, this procedure 
amounts to maximizing the power of the test. (See Appendix A for a discussion of the power of a test.) 

Fortunately, the dilemma of choosing the appropriate value of a can be avoided by using what is known 
as the p value of the test statistic, which is discussed next. 


The Exact Level of Significance: The p Value 


As just noted, the Achilles heel of the classical approach to hypothesis testing is its arbitrariness in selecting 
a. Once a test statistic (e.g., the t statistic) is obtained in a given example, why not simply go to the appro- 
priate statistical table and find out the actual probability of obtaining a value of the test statistic as much as 
or greater than that obtained in the example? This probability is called the p value (i.e., probability value), 
also known as the observed or exact level of significance or the exact probability of committing a Type I 
error. More technically, the p value is defined as the lowest significance level at which a null hypothesis 
can be rejected. 

To illustrate, let us return to our wages-education example. Given the null hypothesis that the true coeffi- 
cient of education is 0.5, we obtained a ¢ value of 3.2 in Eq. (5.7.4). What is the p value of obtaining a t value 
of as much as or greater than 3.2? Looking up the t table given in Appendix D, we observe that for 11 df the 
probability of obtaining such at value must be smaller than 0.005 (one-tail) or 0.010 (two-tail). 


'4Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, pp. 126-127. 
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If you use Stata or EViews statistical packages, you will find that the p value of obtaining a t value of 3.2 
or greater is about 0.00001, that is, extremely small. This is the p value of the observed ż statistic. This exact 
level of significance of the t statistic is much smaller than the conventionally, and arbitrarily, fixed level of 
significance, such as 1, 5, or 10 percent. As a matter of fact, if we were to use the p value just computed, and 
reject the null hypothesis that the true coefficient of education is 0.5, the probability of our committing a Type 
I error would be only about 1 in 100,000! 

As we noted earlier, if the data do not support the null hypothesis, I‘| obtained under the null hypothesis 
will be “large” and therefore the p value of obtaining such a It value will be “small.” In other words, for a 
given sample size, as Izl increases, the p value decreases, and one can therefore reject the null hypothesis with 
increasing confidence. l 

What is the relationship of the p value to the level of significance a? If we make the habit of fixing a equal 
to the p value of a test statistic (e.g., the ¢ statistic), then there is no conflict between the two values. To put 
it differently, it is better to give up fixing a arbitrarily at some level and simply choose the p value of 
the test statistic. It is preferable to leave it to the reader to decide whether to reject the null hypothesis at the 
given p value. If in an application the p value of a test statistic happens to be, say, 0.145, or 14.5 percent, and 
if the reader wants to reject the null hypothesis at this (exact) level of significance, so be it. Nothing is wrong 
with taking a chance of being wrong 14.5 percent of the time if you reject the true null hypothesis. Similarly, 
as in our wages-education example, there is nothing wrong if the researcher wants to choose a p value of 
about 0.02 percent and not take a chance of being wrong more than 2 out of 10,000 times. After all, some 
investigators may be risk-lovers and some risk-averters! 

In the rest of this text, we will generally quote the p value of a given test statistic. Some readers may want 
to fix æ at some level and reject the null hypothesis if the p value is less than æ. That is their choice. 


Statistical Significance versus Practical Significance 


Look back at Example 3.1 and the regression results given in Equation (3.7.1). This regression relates private 
final consumption expenditure (PFCE) to gross domestic product (GDP) in India for the period 1950-51 to 
2006-07 both variables being measured in rupee crore at 1999-2000 prices. 

From this regression we see that the marginal propensity to consume (MPC), that is, the additional 
consumption as a result of an additional rupee of income (as measured by GDP) is about 0.63 or about 63 
paisa. Using the data in Eq. (3.7.1), the reader can verify that the 95 percent confidence interval for the MPC 
is (0.6183, 0.6423). (Note: Since there are 56 df in this problem, we do not have a precise Critical t value for 
these df. Hence, you can use the 2-f rule of thumb to compute the 95 percent confidence interval.) 

Suppose someone maintains that the true MPC is 0.65. Is this number different from 0.63? It is. if we 
strictly adhere to the confidence interval established above. 

But what is the practical or substantive significance of our finding? That is, what difference does it make if 
we take the MPC to be 0.65 rather than 0.63? Is this difference of 0.02 between the two MPCs that important 
practically? 

The answer to this question depends on what we plan to do with these estimates. For example, from 
macroeconomics we know that the income multiplier is 1/(1 - MPC). Thus, if the MPC is 0.63. the multiplier 
is 2.70, but it is 2.86 if the MPC is 0.65. If the government were to increase its expenditure by Rs 1 to lift the 
economy out of a recession, income would eventually increase by Rs. 2.70 if the MPC were 0.63, but it would 
increase by Rs. 2.86 if the MPC were 0.65. And that difference may or may not be crucial to resuscitating 
the economy. 

The point of all this discussion is that one should not confuse statistical significance with practical, or 
economic, significance. As Goldberger notes: 
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When a null, say, B; = 1, is specified, the likely intent is that B, is close to 1, so close that for all practical purposes 
it may be treated as if it were 1. But whether 1.1 is “practically the same as” 1.0 is a matter of economics, not of 
statistics. One cannot resolve the matter by relying on a hypothesis test, because the test statistic [£ =] (b, — 1)/Op, 
measures the estimated coefficient in standard error units, which are not meaningful units in which to measure 
the economic parameter B, — 1. It may be a good idea to reserve the term “significance” for the statistical concept, 
adopting “substantial” for the economic concept.!> 


The point made by Goldberger is important. As sample size becomes very large, issues of statistical signif- 
icance become much less important but issues of economic significance become critical. Indeed, since with 
very large samples almost any null hypothesis will be rejected, there may be studies in which the magnitude 
of the point estimates may be the only issue. 


The Choice between Confidence-Interval and Test-of-Significance 
Approaches to Hypothesis Testing 


In most applied economic analyses, the null hypothesis is set up as a straw man and the objective of the 
empirical work is to knock it down, that is, reject the null hypothesis. Thus, in our consumption—income 
example, the null hypothesis that the MPC B, = 0 is patently absurd, but we often use it to dramatize the 
empirical results. Apparently editors of reputed journals do not find it exciting to publish an empirical piece 
that does not reject the null hypothesis. Somehow the finding that the MPC is statistically different from zero 
is more newsworthy than the finding that it is equal to, say, 0.6! 

Thus, J. Bradford De Long and Kevin Lang argue that it is better for economists 


. . . to concentrate on the magnitudes of coefficients and to report confidence levels and not significance tests. 
If all or almost all null hypotheses are false, there is little point in concentrating on whether or not an estimate 
is indistinguishable from its predicted value under the null. Instead, we wish to cast light on what models are 
good approximations. which requires that we know ranges of parameter values that are excluded by empirical 
estimates.!© 


In short, these authors prefer the confidence-interval approach to the test-of-significance approach. The 
reader may want to keep this advice in mind." 


5.9 Regression Analysis and Analysis of Variance 


In this section we study regression analysis from the point of view of the analysis of variance and introduce 
the reader to an illuminating and complementary way of looking at the statistical inference problem. 
In Chapter 3, Section 3.5, we developed the following identity: 


or re ae (3.5.2) 


15Arthur S. Goldberger, A Course in Econometrics, Harvard University Press, Cambridge, Massachusetts, 1991, p. 240. Note 
b; is the OLS estimator of B, and Op, is its standard error. For a corroborating view, see D. N. McCloskey, “The Loss Function 
Has Been Mislaid: The Rhetoric of Significance Tests,” American Economic Review, vol. 75, 1985, pp. 201-205. See also D. 
N. McCloskey and S. T. Ziliak, “The Standard Error of Regression,” Journal of Economic Literature, vol. 37, 1996, pp. 97-114. 


16See their article cited in footnote 13, p. 1271. 


17For a somewhat different perspective, see Carter Hill, William Griffiths, and George Judge, Undergraduate Econometrics, 
Wiley & Sons, New York, 2001, p. 108. 
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that is, TSS = ESS + RSS, which decomposed the total sum of squares (TSS) into two components: explained 
sum of squares (ESS) and residual sum of squares (RSS). A study of these components of TSS is known as 
the analysis of variance (ANOVA) from the regression viewpoint. 

Associated with any sum of squares is its df, the number of independent observations on which it is 
based. TSS has n — 1 df because we lose 1 df in computing the sample mean Y. RSS has n — 2 df. (Why?) 
(Note: This is true only for the two-variable regression model with the intercept B, present.) ESS has 1 df 
(again true of the two-variable case only), which follows from the fact that ESS = p? >> x? is a function of f2 
only, since ` x? is known. 


Table 5.3 ANOVA Table for the Two-Variable Regression Model 


Source of Variation SS* df msst 
Due to regression (ESS) vo = 63x? 1 BoD 

ĝ2 
Due to residuals (RSS) Di an 2 a = 6? 
TSS xy? neal 


*SS means sum of squares. 
tMean sum of squares, which is obtained by dividing SS by their df. 


Let us arrange the various sums of squares and their associated df in Table 5.3, which is the standard form 
of the AOV table, sometimes called the ANOVA table. Given the entries of Table 5.3, we now consider the 
following variable: 

__ MSS of ESS 
~ MSS of RSS 
__ Ox 


Lx 
=25 


If we assume that the disturbances u; are normally distributed, which we do under the CNLRM, and if the 
null hypothesis (Hp) is that 8, = 0, then it can be shown that the F variable of Equation 5.9.1 follows the F 
distribution with 1 df in the numerator and (n — 2) df in the denominator. (See Appendix 5A. Section 5A.3, 
for the proof. The general properties of the F distribution are discussed in Appendix A.) 

What use can be made of the preceding F ratio? It can be shown’? that 


E( fa lo T (5.9.2) 


des meen | 
Ea? = E(6*) =o? (5.9.3) 


and 


(Note that B, and o° appearing on the right sides of these equations are the true parameters.) Therefore, if B2 
is in fact zero, Equations 5.9.2 and 5.9.3 both provide us with identical estimates of true o. In this situation, 


'8For proof, see K. A. Brownlee, Statistical Theory and Methodology in Science and Engineering, John Wiley & Sons, New York 
1960, pp. 278-280. i 
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the explanatory variable X has no linear influence on Y whatsoever and the entire variation in Y is explained 
by the random disturbances u,. If, on the other hand, B, is not zero, Eqs. (5.9.2) and (5.9.3) will be different 
and part of the variation in Y will be ascribable to X. Therefore, the F ratio of Eq. (5.9.1) provides a test of 
the null hypothesis Ho: B,= 0. Since all the quantities entering into this equation can be obtained from the 
available sample, this F ratio provides a test statistic to test the null hypothesis that true B, is zero. All that 
needs to be done is to compute the F ratio and compare it with the critical F value obtained from the F tables 
at the chosen level of significance, or obtain the p value of the computed F statistic. 


Table 5.4 ANOVA Table for the Wages-Education Example 


Source of Variation SS df MSS 

Due to regression (ESS) 95.4255 1 95.4255 F= oa 
Due to residuals (RSS) 9.6928 11 0.8811 = 108.3026 
TSS 105.1183 12 


To illustrate, let us continue with our illustrative example. The ANOVA table for this example is as shown 
in Table 5.4. The computed F value is seen to be 108.3026. The p value of this F statistic corresponding to 1 
and 11 df cannot be obtained from the F table given in Appendix D, but by using electronic statistical tables 
it can be shown that the p value is 0.0000001, an extremely small probability indeed. If you decide to choose 
the level-of-significance approach to hypothesis testing and fix «æ at 0.01, or a 1 percent level, you can see that 
the computed F of 108.3026 is obviously significant at this level. Therefore, if we reject the null hypothesis 
that 8, = 0, the probability of committing a Type I error is very small. For all practical purposes, our sample 
could not have come from a population with zero B, value and we can conclude with great confidence that X, 
education, does affect Y, average wages. 

Refer to Theorem 5.7 of Appendix 5A.1, which states that the square of the ¢ value with k df is an F value 
with 1 df in the numerator and k df in the denominator. For our example, if we assume H): 8, = 0, then from 
Eq. (5.3.2) it can be easily verified that the estimated t value is 10.41. This ¢ value has 11 df. Under the same 
null hypothesis, the F value was 108.3026 with 1 and 11 df. Hence ( 10.3428) = F value, except for the 
rounding errors. 

Thus, the ż and the F tests provide us with two alternative but complementary ways of testing the null 
hypothesis that B, = 0. If this is the case, why not just rely on the f test and not worry about the F test and 
the accompanying analysis of variance? For the two-variable model there really is no need to resort to the F 
test. But when we consider the topic of multiple regression we will see that the F test has several interesting 
applications that make it a very useful and powerful method of testing statistical hypotheses. 


5.10 Application of Regression Analysis: The Problem of Prediction 


On the basis of the sample data of Table 3.2 we obtained the following sample regression: 
f; = —0.0144 + 0.7240X; (3.6.1) 


where F, is the estimator of true E(Y,) corresponding to given X. What use can be made of this historical 
regression? One use is to “predict” or “forecast” the future mean wages Y corresponding to some given level 
of education X. Now there are two kinds of predictions: (1) prediction of the conditional mean value of Y 
corresponding to a chosen X, say, Xo, that is the point on the population regression line itself (see Figure 2.2), 
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and (2) prediction of an individual Y value corresponding to Xp. We shall call these two predictions the mean 
prediction and individual prediction. 
Mean Prediction"? 


To fix the ideas, assume that X, = 20 and we want to predict E(Y|X, = 20). Now it can be shown that the 
historical regression in Eq. (3.6.1) provides the point estimate of this mean prediction as follows: 


Yo = Êi + Ê2Xo 


= —0.0144 + 0.7240(20) _ (5.10.1) 
= 14.4656 
where Y, = estimator of E(Y | X,). It can be proved that this point predictor is a best linear unbiased estimator 


(BLUE). 

Since Yo is an estimator, it is likely to be different from its true value. The difference between the two 
values will give some idea about the prediction or forecast error. To assess this error, we need to find out the 
sampling distribution of Yo. It is shown in Appendix 5A, Section 5A.4, that Yo in Equation 5.10.1 is normally 
distributed with mean (8, + B,X,) and the varian¢e is given by the following formula: 


(Xo — X)* -XF 
var (Yo) = 
(Yo) =o id Ee “>a (5.10.2) 
By replacing the unknown øg? by its unbiased estimator ô^, we see that the variable 
Yo — (Bi + AXo) 
| a (5.10.3) 


se (Yo) 
follows the ¢ distribution with n — 2 df. The ż distribution can therefore be used to derive confidence intervals 
for the true E(Yo | Xp) and test hypotheses about it in the usual manner, namely, 


Pr[B; + 2X0 — tay se (o) < Bi + B2X0 < Bi + BrXo + taz se(Po)] = 1 -a (5.10.4) 


where se (Yo) is obtained from Eq. (5.10.2). 
For our data (see Table 3.2), 


var (Yo) = 0.8936 [5+ : oe 


182 
= 0.3826 
and 
se (Yo) = 0.6185 
Therefore, the 95 percent confidence interval for true E(Y | Xo) = B; + B-Xọ is given by 


14.4656 — 2.201(.6185) < E(Yo | X = 20) < 14.4656 + 2.20(0.6185) 


19For the proofs of the various statements made, see App. 5A, Sec. 5A.4. 
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that is, 
13.1043 < E(Y | X = 20) < 15.8260 (5.10.5) 


Thus, given X,= 100, in repeated sampling, 95 out of 100 intervals like Equation 5.10.5 will include the true 
mean value; the single best estimate of the true mean value is of course the point estimate 14.4656. 

If we obtain 95 percent confidence intervals like Eq. (5.10.5) for each of the X values given in Table 3.2, 
we obtain what is known as the confidence interval, or confidence band, for the population regression 
function, which is shown in Figure 5.6. 


A 


18 EER: 


Confidence interval 
for mean Y. ———— 13.10 


Mean wage 
= 
© 


E- Confidence interval 
for individual Y 


0 2 4 6 8 10 12 14 16° 18 20 22 
Education 


Figure 5.6 Confidence intervals (bands) for mean Y and individual Y values. 


Individua! Prediction 


If our interest lies in predicting an individual Y value, Y), corresponding to a given X value, say, Xo, then, as 
shown in Appendix 5, Section 5A.4, a best linear unbiased estimator of Y, is also given by Eq. (5.10.1), but 
its variance is as follows: 


_ 72 
Lael Xo) | (5.10.6) 


var (Yo — Yo) = E[Y% — AP = 0° |1 + -+ ——— 
(Yo — Yo) = E[Yo — Yo] | G E 
It can be shown further that Y, also follows the normal distribution with mean and variance given by 
Eqs. (5.10.1) and (5.10.6), respectively. Substituting ô? for the unknown a”, it follows that 
gets i 
se (Yo — Yo) 
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also follows the ¢ distribution. Therefore, the ¢ distribution can be used to draw inferences about the true Yọ. 
Continuing with our example, we see that the point prediction of Yg is 14.4656, the same as that of Yo, and its 
variance is 1.2357 (the reader should verify this calculation). Therefore, the 95 percent confidence interval 
for Yo corresponding to X= 100 is seen to be 


(12.0190 < Yo | Xo = 20 < 16.9122) (5.10.7) 


Comparing this interval with Eq. (5.10.5), we see that the confidence interval for individual Yọ is wider than 
that for the mean value of Yọ. (Why?) Computing confidence intervals like Equation 5.10.7 conditional upon 
the X values given in Table 3.2, we obtain the 95 percent confidence band for the individual Y values corre- 
sponding to these X values. This confidence band along with the confidence band for Yo associated with the 
same X’s is shown in Figure 5.6. 

Notice an important feature of the confidence bands shown in Figure 5.6. The width of these bands is 
smallest when Xo = X. (Why?) However, the width widens sharply as Xọ moves away from Xn Why?) This 
change would suggest that the predictive ability of the historical sample regression line falls markedly as 
Xo departs progressively from X. Therefore, one should exercise great caution in “extrapolating” the 
historical regression line to predict E(Y | X) or Yọ associated with a given X, that is far removed from 
the sample mean X. 


5.11 Reporting the Results of Regression Analysis 


There are various ways of reporting the results of regression analysis, but in this text we shall use the following 
format, employing the wages-education example of Chapter 3 as an illustration: 


f; = —0.0144 + 0.7240X; 


se = (0.9317) (0.0700) r? = 0.9065 
t = (—0.0154) (10.3428) dfin Ca 
p= 0/987) (0.000) F) 11 = 108.30 


In Equation 5.11.1 the figures in the first set of parentheses are the estimated standard errors of the 
regression coefficients, the figures in the second set are estimated + values computed from Eq. (5.3.2) 
under the null hypothesis that the true population value of each regression coefficient individually is zero 
(e.g., 10.3428 = are), and the figures in the third set are the estimated p values. Thus, for 11 df the proba- 
bility of obtaining a ¢ value of 10.3428 or greater is 0.00009, which is practically zero. 

By presenting the p values of the estimated t coefficients, we can see at once the exact level of significance 
of each estimated ż value. Thus, under the null hypothesis that the true population slope value is zero (i.e., that 
is, education has no effect on mean wages), the exact probability of obtaining a t value of 10.3428 or greater 
is practically zero. Recall that the smaller the p value, the smaller the probability of making a mistake if we 
reject the null hypothesis. 

Earlier we showed the intimate connection between the F and t statistics, namely, Fı 4 = t?. Under the 
null hypothesis that the true 8, = 0, Eq. (5.11.1) shows that the F value is 108.30 (for 1 numerator and 11 
denominator df) and the ż value is about 10.34 (11 df); as expected, the former value is the square of the latter 
value, except for the round-off errors. The ANOVA table for this problem has already been discussed. 
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5.12 Evaluating the Results of Regression Analysis 


In Figure 1.4 of the Introduction we sketched the anatomy of econometric modeling. Now that we have 
presented the results of regression analysis of our wages-education example in Eq. (5.11.1), we would like to 
question the adequacy of the fitted model. How “good” is the fitted model? We need some criteria with which 
to answer this question. 

First, are the signs of the estimated coefficients in accordance with theoretical or prior expectations? A 
priori, 8, in the wages-education example should be positive. In the present example it is. Second, if theory 
says that the relationship should be not only positive but also statistically significant, is this the case in 
the present application? As we discussed in Section 5.11, the education coefficient is not only positive but 
also statistically significantly different from zero; the p value of the estimated ¢ value is extremely small. 
The comment about significance applies about the intercept coefficient. Third, how well does the regression 
model explain variation in our example? One can use r° to answer this question. In the present example 7° is 
about 0.90, which is a very high value considering that r* can be at most 1. 

Thus, the model we have chosen for explaining mean wages seems quite good. But before we sign off, we 
would like to find out whether our model satisfies the assumptions of CNLRM. We will not look at the various 
assumptions now because the model is patently so simple. But there is one assumption that we would like to 
check, namely, the normality of the disturbance term, u;. Recall that the ¢ and F tests used before require that 
the error term follow the normal distribution. Otherwise, the testing procedure will not be valid in small, or 
finite, samples. 


Normality Tests 


Although several tests of normality are discussed in the literature, we will consider just three: (1) histogram 
of residuals; (2) normal probability plot (NPP), a graphical device; and (3) the Jarque—Bera test. 


Histogram of Residuals 


A histogram of residuals is a simple graphic device that is used to learn something about the shape of the 
probability density function (PDF) of a random variable. On the horizontal axis, we divide the values of the 
variable of interest (e.g., OLS residuals) into suitable intervals, and in each class interval we erect rectangles 
equal in height to the number of observations (i.e., frequency) in that class interval. If you mentally super- 
impose the bell-shaped normal distribution curve on the histogram, you will get some idea as to whether 
normal (PDF) approximation may be appropriate. For the wages-education regression, the histogram of the 
residuals is as shown in Figure 5.7. 

This diagram shows that the residuals are not perfectly normally distributed; for a normally distributed 
variable the skewness (a measure of symmetry) should be zero and kurtosis (which measures how tall or 
squatty the normal distribution is) should be 3. 

But it is always a good practice to plot the histogram of residuals from any regression as a rough and ready 
method of testing for the normality assumption. 


Normal Probability Plot 


A comparatively simple graphical device to study the shape of the probability density function (PDF) of a 
random variable is the normal probability plot (NPP), which makes use of normal probability paper, a 
specially designed graph paper. On the horizontal, or X, axis, we plot values of the variable of interest (say, 
OLS residuals, ĉ,), and on the vertical, or Y, axis, we show the expected value of this variable if it were 
normally distributed. Therefore, if the variable is in fact from the normal population, the NPP will be approxi- 
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Figure 5.7 Histogram of residuals for wages—education data. 


mately a straight line. The NPP of the residuals from our wages-education regression is shown in Figure 5.8, 
which is obtained from the MINITAB software package, version 15. As noted earlier, if the fitted line in the 
NPP is approximately a straight line, one can conclude that the variable of interest is normally distributed. 
In Figure 5.8, we see that residuals from our illustrative example are ian normally distributed, 
because a straight line seems to fit the data reasonably well. 

MINITAB also produces the Anderson—Darling normality test, known as the A? statistic. The under- 
lying null hypothesis is that the variable under consideration is normally distributed. As Figure 5.8 shows, for 
our example, the computed A? statistic is 0.289. The p value of obtaining such a value of A` is 0.558, which 
is reasonably high. Therefore, we do not reject the hypothesis that the residuals from our illustrative example 
are normally distributed. Incidentally, Figure 5.8 shows the parameters of the (normal) distribution, the mean 
is approximately 0, and the standard deviation is about 0.8987. 


Jarque-Bera (JB) Test of Normality’? 


The JB test of normality is an asymptotic, or large-sample, test. It is also based on the OLS residuals. This 
test first computes the skewness and kurtosis (discussed in Appendix A) measures of the OLS residuals and 


uses the following test statistic: 
SE) 
JB = — oan 
n | 6 ae 74 l (5.12.1) 


where n = sample size, S = skewness coefficient, and K = kurtosis coefficient. For a normally distributed 
variable, S = 0 and K = 3. Therefore, the JB test of normality is a test of the joint hypothesis that S and K are 
0 and 3, respectively. In that case the value of the JB statistic is expected to be 0. 

Under the null hypothesis that the residuals are normally distributed, Jarque and Bera showed that asymp- 
totically (i.e., in large samples) the JB statistic given in Equation (5.12.1) follows the chi-square distribution 
with 2 df. If the computed p value of the JB statistic in an application is sufficiently low, which will happen if 


wv 


20See C. M. Jarque and A. K. Bera, “A Test for Normality of Observations and Regression Residuals,” International Statistical 
Review, vol. 55, 1987, pp. 163-172. 
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Figure 5.8 Residuals from wages-education regression. 


the value of the statistic is very different from 0, one can reject the hypothesis that the residuals are normally 
distributed. But if the p value is reasonably high, which will happen if the value of the statistic is close to zero, 
we do not reject the normality assumption. 

For our example, the estimated JB statistic for our wages-education example is 0.8286. The null hypothesis 
that the residuals in the present example are normally distributed cannot be rejected, for the p value of 
obtaining a JB statistic as much as 0.8286 or greater is about 0.66 or 66 percent. This probability is quite 
high. Note that although our regression has 13 observations, these observations were obtained from a sample 
of 528 observations, which seems reasonably high. 


Other Tests of Model Adequacy 


Remember that the CNLRM makes many more assumptions than the normality of the error term. As we 
examine econometric theory further, we will consider several tests of model adequacy (see Chapter 13). Until 
then, keep in mind that our regression modeling is based on several simplifying assumptions that may not 
hold in each and every case. 


A Concluding Example Let us return to Example 3.2 about food expenditure in India. Using the data given in 
Equation (3.7.2) and adopting the format of Equation (5.11.1), we obtain the following expenditure equation: 


FoodExp;= 94.2087 + 0.4368 TotalExp; 


se = (50.8563) (0.0783) 
t= (1.8524) (5.5770) 
p= (0.0695) (0.0000)* 
r?= 0.3698;  df=53 
F,s3= 31.1034  (pvalue = 0.0000)* 


(5.12.2) 


where * denotes extremely small. 
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First, let us interpret this regression. As expected, there is a positive relationship between expenditure on food and 
total expenditure. If total expenditure went up by a rupee, on average, expenditure on food increased by about 44 paise. 
If total expenditure were zero, the average expenditure on food would be about 94 rupees. Of course, this mechanical 
interpretation of the intercept may not make much economic sense. The r value of about 0.37 means that 37 percent of 
the variation in food expenditure is explained by total expenditure, a proxy for income. 

Suppose we want to test the null hypothesis that there is no relationship between food expenditure and total expen- 
diture, that is, the true slope coefficient 8, = 0. The estimated value of B, is 0.4368. If the null hypothesis were true, what 
is the probability of obtaining a value of 0.4368? Under the null hypothesis, we observe from Eq. (5.1 2.2) that the t value 
is 5.5770 and the p value of obtaining such a t value is practically zero. In other words, we can reject the null hypothesis 
resoundingly. But suppose the null hypothesis were that 8, = 0.5. Now what? Using the t test we obtain: 


š 0.4368 — 0.5 
=a a —0.8071 
The probability of obtaining a |t| of 0.8071 is greater than 20 percent. Hence we do not reject the hypothesis that the true 


Bz is 0.5. 

Notice that, under the null hypothesis, the true slope coefficient is zero, the F value is 31.1034, as shown in Eq. (5.12.2). 
Under the same null hypothesis, we obtained a t value of 5.5770. If we square this value, we obtain 31.1029, which is about 
the same as the F value, again showing the close relationship between the t and the F statistic. (Note: The numerator df for 
the F statistic must be 1, which is the case here.) 

Using the estimated residuals from the regression, what can we say about the probability distribution of the error term? 
The information is given in Figure 5.9. As the figure shows, the residuals from the food expenditure regression seem to be 
symmetrically distributed. Application of the Jarque—Bera test shows that the JB statistic is about 0.2576, and the probability 
of obtaining such a statistic under the normality assumption is about 88 percent. Therefore, we do not reject the hypothesis 
that the error terms are normally distributed. But keep in mind that the sample size of 55 observations may not be large 
enough. 


Series: Residuals 
. Sample 1 55 
Observations 55 


na 
fej 
Bo} a 
= Mean ~=1.19x104 
3 Median 7.747849 
£ Maximum 171.5859 
‘= Minimum —153.7664 
5 Std. dev. 66.23382 
E Skewness 0.119816 
5 Kurtosis 3.234473 
A w 


Jarque—Bera 0.257585 
Probability 0.879156 
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Figure 5.9 Residuals from the food expenditure regression. 


We leave it to the reader to establish confidence intervals for the two regression coefficients as well as to obtain the 
normal probability plot and do mean and individual predictions. 


Summary and Conclusions 


l. Estimation and hypothesis testing constitute the two main branches of classical statistics. Having 
discussed the problem of estimation in Chapters 3 and 4, we have taken up the problem of hypothesis 
testing in this chapter. 
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Hypothesis testing answers this question: Is a given finding compatible with a stated hypothesis or not? 
There are two mutually complementary approaches to answering the preceding question: confidence 
interval and test of significance. 
Underlying the confidence-interval approach is the concept of interval estimation. An interval 
estimator is an interval or range constructed in such a manner that it has a specified probability of 
including within its limits the true value of the unknown parameter. The interval thus constructed is 
known as a confidence interval, which is often stated in percent form, such as 90 or 95 percent. The 
confidence interval provides a set of plausible hypotheses about the value of the unknown parameter. If 
the null-hypothesized value lies in the confidence interval, the hypothesis is not rejected, whereas if it 
lies outside this interval, the null hypothesis can be rejected. 

5. In the significance test procedure, one develops a test statistic and examines its sampling distribution 
under the null hypothesis. The test statistic usually follows a well-defined probability distribution 
such as the normal. r, F, or chi-square. Once a test statistic (e.g., the ¢ statistic) is computed from the 
data at hand, its p value can be easily obtained. The p value gives the exact probability of obtaining 
the estimated test statistic under the null hypothesis. If this p value is small, one can reject the null 
hypothesis, but if it is large one may not reject it. What constitutes a small or large p value is up to the 
investigator. In choosing the p value the investigator has to bear in mind the probabilities of committing 
Type I and Type II errors. 

6. In practice, one should be careful in fixing æ, the probability of committing a Type I error, at arbitrary 
values such as 1, 5, or 10 percent. It is better to quote the p value of the test statistic. Also, the statistical 
significance of an estimate should not be confused with its practical significance. 

7. Of course, hypothesis testing presumes that the model chosen for empirical analysis is adequate in the 
sense that it does not violate one or more assumptions underlying the classical normal linear regression 
model. Therefore, tests of model adequacy should precede tests of hypothesis. This chapter introduced 
one such test, the normality test, to find out whether the error term follows the normal distribution. 
Since in small, or finite, samples, the rt, F, and chi-square tests require the normality assumption, it is 
important that this assumption be checked formally. 

8. If the model is deemed practically adequate, it may be used for forecasting purposes. But in forecasting 

the future values of the regressand, one should not go too far out of the sample range of the regressor 

values. Otherwise, forecasting errors can increase dramatically. 


wd 


= 


Multiple Choice Questions 


1. Reliability of a point estimation is measured by its 
a. Standard deviation 
b. Standard normal curve 
c. Standard error 
d. Coefficient of determination 
2. Rejecting a true hypothesis results in this type of error 
a. Type I error 
b. Type II error 
c. Structural error 
d. Hypothesis error 
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Accepting a false hypothesis results in this type of error 
a. Type I error 
b. Type Il error 
c. Structural error 
d. Hypothesis error 
The end points of the confidence interval ( By +6) are known as 
a. Critical error 
b. Confidence limit 
c. Confidence value 
d. Limiting value p 
The a in a confidence interval given by Pr( a ô < P, < B, — 6) = 1 — a is known as 
a. Confidence coefficient 
b. Level of confidence 
c. Level of significance 
d. Significance coefficient 
The (1-a) in a confidence interval given by Pr( B, -d5<fB,< = =1-—q is known as 
a. Confidence coefficient 
b. Level of confidence 
c. Level of significance 
d. Significance coefficient 


A 


. The a in a confidence interval given by Pr(ĝ, — ô < B, < B, - ô) =1- a should be 


a. <0 

b. >0 

a | 

d. >Qand<1 
In confidence interval estimation, a = 5%, this means that this interval includes the true B with proba- 
bility of 

a. 5% 

b. 50% 

c. 95% 

d. 45% 
The confidence interval constructed for Ê, will be same irrespective of the sample analyzed. This 
statement is 

a. True 

b. False 

c. May be true 

d. Nonsense statement 
The larger the standard error of the estimator, the greater is the uncertainty of estimating the true value 
of the unknown parameters. This statement is 

a. True 

b. False 

c. May be true 

d. Nonsense statement 
Standard error of an estimator is a measure of 

a. Population estimator 

b. Precision of the estimator 


12. 


13: 


14. 
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16. 


17. 


18. 


19. 
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c. Power of the estimator 

d. Confidence interval of the estimator 
The sample variance estimator G, follows 

a. t-distribution 

b. normal distribution 

c. F-distribution 

d. Chi-square distribution s 
The sample parameter estimator B, follows 

a. t- distribution 

b. normal distribution 

c. F-distribution 

d. Chi-square distribution 
For HO: B, = 0; H1: B, # 0; this is 

a. One-sided hypothesis test 

b. Two-sided hypothesis test 

c. Open ended hypothesis test 

d. t-test 
For HO: B, = 0; H1: B, #0; this is 

a. One-sided hypothesis test 

b. Two-sided hypothesis test 

c. Open ended hypothesis test 

d. t-test 


When we do not reject a H, for B, this means that the value of B, under Hp falls within the confidence 


interval defined by ë 
a. (1-—a)% 
b. (a+ 1)% 
c. 100(1 - a)% 
d. (100 - a)% 


When we reject the null hypothesis, then our finding is said to be 


a. 95% probability finding 

b. 5% confidence finding 

c. Not statistically significant 
d. Statistically significant 


In constructing a confidence interval of 100(1 - œ)%, given a = 0.05, the probability of committing 


Type H error is 
a. 5% 

b. 95% 

c. 90% 

d. Zero% 


Under one tail test, the confidence interval is constructed as 


a. 100 (1-a)% 
b. (100 — a)% 
c. (1-a) 

d. a% 
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Under test-of-significance approach (t-test), a statistic is said to be statistically significant if the value 
of the test statistic lies in the critical region. This statement is 
a. True 
b. False 
c. Specific to a value 
d. Specific to t-value 
Under test-of-significance approach (t-test), a statistic is said to be statistically insignificant if the value 
of the test statistic lies in the critical region. This statement is 
a. True 
b. False 
c. Specific to a value 
d. Specific to t-value 
Given that the test statistics under test-of-significance approach lies in the critical region. the null 
hypothesis is 
a. Not rejected 
b. Rejected 
c. Rejection depends on a value 
d. Rejection depends on t-value 
We do not reject the null hypothesis when the test statistics under the test-of-significance lies in the 
a. Rejection region 
b. Critical region 
c. Acceptance region 
d. Null hypothesis region i 
The decision to use a two-tailed test of significance or one-tail test of significance depends on the 
a. Researcher’s objectivity 
b. Null hypothesis framed 
c. Alternative hypothesis framed 
d. Sample data 
Chi-square test of significance is used to test the significance of the test statistic 
a. B 
b. o° 
ial Yad 
d. t 
In “2-f rule of thumb, 20 or more is the required 
a. Sample size 
b. Level of significance 
c. Degree of freedom 
d. t-value - 
Given the sample size, if we try to reduce the probability of rejecting the true hypothesis, we at the 
same time increase the probability of accepting the false hypothesis. This statement is 
a. True only for Type I error : 
b. True only for Type II error 
c. Always True 
d. Never true 
The lowest significance level at which a null hypothesis can be rejected is determined by 
a. t-value 
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b. p-value 

c. Significance test 

d. Confidence interval 
For t-test and F-test, particularly in small samples, we require the error terms to follow 

a. t-distribution 

b. F-distribution 

c. Normal distribution 

d. Chi-square distribution 
As a rule of thumb, testing the Hy of slope coefficient being equal to 2, at 5% significance level, you 
should 

a. Subtract 2 from the estimated coefficient, divide the difference by the standard error and check if 

the resulting ratio is larger than 2 in absolute value 

b. See if the slope coefficient is between 0.95 and 1.05 

c. Check is the adjusted R? is close to 1 

d. Divide the slope coefficient by the standard error and check if the resulting ratio is larger than 2 in 

absolute value 

A researcher estimates the unknown demand curve Q = a + BP + e and fits the regression line 
Q = a-bP using OLS technique. The following information is given to you: Sample size = 10. TSS = 
80; ESS = 44. Which of the following is correct? 

a. RSS =5 

b. Given that a = 4 and b = 2, demand is price unitary elastic at P = 1.5 

c. All coefficients are statistically significant at 5% level of significance 

RnS 
A researcher tested Hp: u = 20 against H,: u = 25. Using the data, she reached a conclusion such that 
P(Type I error) = 0.0607. Which of the following is a correct statement? 

a. Hj) could not have been rejected l 

b. The significance level chosen for this test was definitely greater than 5% 

c. The significance level chosen for this test was less than the p-value 

d. The p-value of the test was less than 0.05 
The Jarque-Bera test is a 

a. Model misspecification test 

b. Residual normality test 

c. Test of unbiasedness of estimators 

d. Test of goodness of fit for the model 
Normality of the error terms is essential for hypothesis testing. This statement is 

a. Always true 

b. Sometime true 

c. Always false 

d. Can’t say 
If the estimated D. is equal to the hypothesized B,, the ‘r’ value will be equal to 

a. 0 
b. 1 
c. 30 . 
d. Standard error of B, 
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Exercises 


Questions 


5-1: 


52 


59 


5.4. 


59. 


State with reason whether the following statements are true, false, or uncertain. Be precise. 

a. The ż test of significance discussed in this chapter requires that the sampling distributions of 
estimators ĝi and Bo follow the normal distribution. 

b. Even though the disturbance term in the CLRM is not normally distributed, the OLS estimators are 

still unbiased. 

If there is no intercept in the regression model, the estimated u;( = ü;) will not sum to zero. 

The p value and the size of a test statistic mean the same thing. 

In a regression model that contains the intercept, the sum of the residuals is always zero. 

If a null hypothesis is not rejected, it is true. 

The higher the value of a, the larger is the variance of p> given in EG GSM: 

The conditional and unconditional means of a random variable are the same things. 

In the two-variable PRF, if the slope coefficient 8, is zero, the intercept 6; is estimated by the 

sample mean Y. 

j. The conditional variance, var (Y, | X;) = o°, and the unconditional variance of Y, var (Y) = Ge, will 
be the same if X had no influence on Y. 

Set up the ANOVA table in the manner of Table 5.4 for the regression model given in Eq. (3.7.2) and 

test the hypothesis that there is no relationship between food expenditure and total expenditure in 

India. 

Refer to the demand for cell phones regression given in Eq. (3.7.3). 

a. Is the estimated intercept coefficient significant at the 5 percent level of significance? What is the 
null hypothesis you are testing? 

b. Is the estimated slope coefficient significant at the 5 percent level? What is the underlying null 
hypothesis? 

c. Establish a 95 percent confidence for the true slope coefficient. 

d. What is the mean forecast value of cell phones demanded if the per capita income is $9,000? What 
is ie 95 percent confidence interval for the forecast value? 

Let p* reptesnnite the true population coefficient of determination. Suppose you want to test the 

hypothesis that p? = 0. Verbally explain how you would test this hypothesis. Hint: Use Eq. (3.5.11). 

See also Exercise 5.7. 

What is known as the characteristic line of modern investment analysis is simply the regression line 

obtained from the following model: 


~ Fon, 8 AO 


Fit = Qi + Bite + Ur 
where r;,= the rate of return on the ith security in time t 
ki = the rate of return on the market portfolio in time t 
= stochastic disturbance term 


In this neat B; is known as the beta coefficient of the ith security, a measure of market (or systematic) 
risk of a security. 


*See Haim Levy and Marshall Sarnat, Portfolio and Investment Selection: Theory and Practice, Prentice Hall International, 
Englewood Cliffs, NJ, 1984, Chap. 12. 
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On the basis of 240 monthly rates of return for the period 1956-1976, Fogler and Ganapathy obtained 
the following characteristic line for IBM stock in relation to the market portfolio index developed at 
the University. of Chicago:* 


Fit = 0.7264 + 1.0598r mr 2 = 0.4710 
se = (0.3001) (0.0728) df = 238 
F238 = 211.896 


a. A security whose beta coefficient is greater than one is said to be a volatile or aggressive security. 
Was IBM a volatile security in the time period under study? 
b. Is the intercept coefficient significantly different from zero? If it is, what is its practical meaning? 
5.6. Equation (5.3.5) can also be written as 


Pr [Bo — tase (Bo) < P2 < Bo + tanse (ĝ2)] = 1—a 


That is, the weak inequality (=) can be replaced by the strong inequality (<). Why? 

5.7. R.A. Fisher has derived the sampling distribution of the correlation coefficient defined in Eq. (3.5.13). 
If it is assumed that the variables X and Y are jointly normally distributed, that is, if they come 
from a bivariate normal distribution (see Appendix 4A, Exercise 4.1), then under the assumption that 
the population correlation coefficient p is zero, it can be shown that tf = rn — 2/4 1 — r? follows 
Student’s r distribution with n —2 df.™* Show that this ż value is identical with the ż value given in Eq. 
(5.3.2) under the null hypothesis that 6, = 0. Hence establish that under the same null hypothesis F = 
P. (See Section 5.9.) 

5.8. Consider the following regression output:' 


f, = 0.2033 + 0.6560X, 
se = (0.0976) (0.1961) 
r2= 0.397 RSS=0.0544 ESS = 0.0358 


where Y = labor force participation rate (LFPR) of women in 1972 and X = LFPR of women in 1968. 

The regression results were obtained from a sample of 19 cities in the United States. 

a. How do you interpret this regression? 

b. Test the hypothesis: Hy: B2 = 1 against H,: B, > 1. Which test do you use? And why? What are the 
underlying assumptions of the test(s) you use? 

c. Suppose that the LFPR in 1968 was 0.58 (or 58 percent). On the basis of the regression results 
given above, what is the mean LFPR in 1972? Establish a 95 percent confidence interval for the 
mean prediction. 

d. How would you test the hypothesis that the error term in the population regression is normally 
distributed? Show the necessary calculations. 


*H. Russell Fogler and Sundaram Ganapathy, Financial Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1982, p. 13. 

“If pis in fact zero, Fisher has shown that r follows the same t distribution provided either X or Yis normally distributed. But 
if pis not equal to zero, both variables must be normally distributed. See R. L. Anderson and T. A. Bancroft, Statistical Theory 
in Research, McGraw-Hill, New York, 1952, pp. 87-88. 

tadapted from Samprit Chatterjee, Ali S. Hadi, and Bertram Price, Regression Analysis by Example, 3d ed., Wiley Interscience, 
New York, 2000, pp. 46-47. 
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Table 5.5 Average Salary and Per Pupil Spending (dollars), 1985 


Observation Salary Spending Observation Salary Spending 
1 19,583 3346 27 22,795 3366 
2 20,263 3114 28 21,570 2920 
3 20,325 3554 29 22,080 2980 
4 26,800 4642 30 22,250 3731 
5 29,470 4669 31 20,940 2853 
6 26,610 4888 32 21,800 2555 
7 30,678 5710 33 22,934 2729 
8 27,170 5536 34 18,443 2305 
9 25,853 4168 35 19,538 2642 

10 24,500 3547 36 20,460 3124 
11 24,274 3159 37 ~ 21419 =- 2752 
12 27,170 3621 38 25,160 3429 
15 30,168 3782 39 22,482 © 3947 
14 26,525 4247 40 20,969 2509 
iS 27,360 3982 41 27,224 5440 
16 21,690 3568 42 25,892 . 4042 
es 21,974 3155 43 22,644 3402 
18 20,816 3059 44 24,640 2829 
19 18,095 2967 45 22,341 2297 
20 20,939 3285 46 25,610 2932 
21 22,644 3914 47 26,015 3705 
22 24,624 4517 48 25,788 ` 4123 
23 27,186 4349 49 29,132 3608 
24 33,990 5020 50 41,480 8349 
25 23,382 3594 51 25,845 3766 
26 20,627 2821 


Source: National Education Association, as reported by Albuquerque Tribune, Nov. 7, 1986. 


Empirical Exercises = 


5.9. Table 5.5 gives data on average public teacher pay (annual salary in dollars) and spending on public 
schools per pupil (dollars) in 1985 for 50 states and the District of Columbia. 

To find out if there is any relationship between teacher’s pay and per pupil expenditure in public 
schools, the following model was suggested: Pay; = 8, + B Spend; + u;, where Pay stands for teacher’s 
salary and Spend stands for per pupil expenditure. 

a. Plot the data and eyeball a regression line. 

b. Suppose on the basis of (a) you decide to estimate the above regression model. Obtain the estimates 
of the parameters, their standard errors, 7°, RSS, and ESS. 

c. Interpret the regression. Does it make economic sense? 

d. Establish a 95 percent confidence interval for B,. Would you reject the hypothesis that the true 
slope coefficient is 3.0? 

e. Obtain the mean and individual forecast value of Pay if per pupil spending is $5,000. Also establish 
95 percent confidence intervals for the true mean and individual values of Pay for the given 
spending figure. 

J. How would you test the assumption of the normality of the error term? Show the test(s) you use. 


5210. 
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Refer to Exercise 3.20 and set up the ANOVA tables and test the hypothesis that there is no relationship 


between productivity and real wage compensation. Do this for both the business and nonfarm business 
sectors. 


Refer to Exercise 1.7. 

a. Plot the data with impressions on the vertical axis and advertising expenditure on the horizontal 
axis. What kind of relationship do you observe? 

b. Would it be appropriate to fit a bivariate linear regression model to the data? Why or why not? If 
not, what type of regression model will you fit the data to? Do we have the necessary tools to fit 
such a model? 

c. Suppose you do not plot the data and simply fit the bivariate regression model to the data. Obtain 
the usual regression output. Save the results for a later look at this problem. 

Refer to Exercise 1.1. 

a. Plot the Indian Consumer Price Index (CPI) against the Canadian CPI. What does the plot show? 

b. Suppose you want to predict the Indian CPI on the basis of the Canadian CPI. Develop a suitable 
model. 

c. Test the hypothesis that there is no relationship between the two CPIs. Use a = 5%. If you reject 
the null hypothesis, does that mean the Canadian CPI “causes” the Indian CPI? Why or why not? 

Refer to Problem 3.22. 

a. Estimate the two regressions given there, obtaining standard errors and the other usual output. 

b. Test the hypothesis that the disturbances in the two regression models are normally distributed. 

c. In the gold price regression, test the hypothesis that 8, = 1, that is, there is a one- to-one relationship 
between gold prices and WPI (i.e., gold is a perfect hedge). What is the p value of the estimated test 
statistic? 

d. Repeat step (c) for the SENSEX Index regression. Is investment in the stock market a perfect 
hedge against inflation? What is the null hypothesis you are testing? What is its p value? 

e. Between gold and stock, which investment would you choose? What is the basis of your decision? 

Table 5.6 gives data on GNP and three definitions of the money stock for India for 1951-52 to 2004. 

Regressing GNP on the various definitions of money, we obtain the results shown in Table 5.7. 

The monetarists or quantity theorists maintain that nominal income (i.e., nominal GNP) is largely 
determined by changes in the quantity or the stock of money, although there is no consensus as to the 
“right” definition of money. Given the results in the preceding table, consider these questions: 

a. Which definition of money seems to be closely related to nominal GNP? 

b. Since the r terms are uniformly high, does this fact mean that our choice for definition of money 
does not matter? 

c. If the RBI wants to control the money supply, which one of these money measures is a better target 
for that purpose? Can you tell from the regression results? 

Suppose the equation of an indifference curve between two goods is 


XV; = Bi + BoXi 
How would you estimate the parameters of this model? Apply the preceding model to the data in Table 
5.8 and comment on your results. 


Table 5.6 GNP and Three Measures of Money Stock 


Year GNP MO M1 M3 Year GNP MO M1 M3 
1951-52 10686 1357 1812 2137 1978-79 111215 14082 17292 40112 


1952-53 10497 1334 1764 2121 1979-80 122308 16573 20000 47226 
1953-54 11433 1385 1828 2200 1980-81 145715 19452 23424 55774 
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1954-55 10805 14720 9S5 2379 1981-82 170845 20998 24937 62752 


1955-56 11020 1676 2217 2683 1982-83 190425 23110 28535 73184 
1956-57 13122 1734 2342 2869 1983-84 221541 28994 33398 86525 
1957-58 13516 1804 2413 3164 1984-85 247884 35216 39915 102933 
1958-59 15051 1929 2526 3476 1985-86 279901 38165 44095 119394 
1959-60 15838 2111 2720 3883 1986-87 313011 44808 51516 141632 
1960-61 17335 2239 2369 3964 1987-88 355242 53489 58555 164275 
1961-62 18347 2352 3049 4247 1988-89 420035 62958 66786 193493 
1962-63 19718 2546 3317 4560 1989-90 482953 MUN 81060 230950 
1963-64 22662 2781 3752 5037 1990-91 562079 87779" 92892 265828 
1964-65 26418 2962 4080 5498 1991-92 644652 99505 114406 317049 
1965-66 27852 3233 4829 6134 1992-93 740946 110779 124066 364016 
1966-67 31418 3464 4951 6817 1993-94 853725 138672 150778 431084 
1967-68 36875 3662 5350 7460 1994-95 1002681 169283 192257 527596 
1968-69 39069 4069 5779 8306 1995-96 1178329 194457 214835 599791 
1969-70 43027 4390 6536 9639 1996-97 1365535 199985 240615 696012 
1970-71 45965 4822 7374 11020 1997-98 1513953 226402 267844 821332 
1971-72 49232 5382 8223 12693 1998-99 1736231 259286 309068 980960 
1972-73 54289 6033 9700 15013 1999-00 1936605 280555 341796 1124174 
1973-74 66103 7273 11200 17624 2000-01 2079581 303311 379450 1313220 
1974-75 78135 7604 11975 19549 2001-02 2258884 337970 422843 1498355 
1975-76 83966 7808 13325 22480 2002-03 2437871 369061 473581 1717960 
1976-77 90518 9798 16024 27781 2003-04 2733912 436512 578716 2005676 


1977-78 102563 10941 14388 32906 2004-05 3127032 489135 647495 2251449 
Source: Handbook of Statistics on Indian Economy, 2009~10, RBI, Mumbai. i s 
Definitions: 
@ GNP= Gross National Product at current market prices (Rs. Crore) 
@ MO = (Reserve Money) = includes currency in circulation, ‘Other’ deposits with the RBI, Bankers 


deposit with the RBI 

è M! = (Narrow Money) = includes currency with the public, ‘Other’ deposits with the RBI, Demand 
deposits 

e@ M2 = (Broad Money) = includes Time deposits and M1 s 


Table 5.7: GNP-Money Stock Regressions, 1951—52 to 2004—05 


1. gap = 12653.0039 + 6.4829Mo r? = 0.9977 

t (6228.7559) (0.0428) i : 
2. gpp = 32456.4294 `+ 5.0894M, sr? = 0.9933 

t (10643.7083) + (0.0581) 
3. gap = 75680.1785 + 1.4633Mz, r? = 0.9818 

t (17127.5435) ` (0.0277) 


Note: The figures in parenthesis are the estimated standard errors. 
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Table 5.8 


Consumption of good X: 1 2 3 4 5 
Consumption of good Y: 4 35 2.8 19 08 


Since 1986 the Economist has been publishing the Big Mac Index as a crude, and hilarious, measure 
of whether international currencies are at their “correct” exchange rate, as judged by the theory of 
purchasing power parity (PPP). The PPP holds that a unit of currency should be able to buy the 
same bundle of goods in all countries. The proponents of PPP argue that, in the long run, currencies 
tend to move toward their PPP. The Economist uses McDonald’s Big Mac as a representative bundle 
and gives the information in Table 5.9. 

Consider the following regression model: 


Y; = Bi + BoX; Fu; 


where Y = actual exchange rate and X = implied PPP of the dollar. 
a. If the PPP holds, what values of 8, and B, would you expect a priori? 


Table 5.9 The Hamburger Standard 


Actual Under (—) / 
: . Dollar Over (+) 
Big MaciPrices Implied Exchange Valuation 
In Local In + PPP" of Rate, against the 
Currency Dollars the Dollar Jan 31st Dollar, % 
United Statest $3.22 3.22 

Argentina Peso 8.25 2.65 2.56 Bu —18 
Australia A$3.45 2.67 1.07 129 —17 
Brazil Real 6.4 3.01 1.99 2.13 —6 
Britain £1.99 3.90 1.624 1.964 +21 
Canada C$3.63 3.08 103 1.18 —4 
Chile Peso 1,670 3107 519 544 --5 
China Yuan 11.0 1.41 3.42 7.77 —56 
Colombia Peso 6,900 3.06 2,143 2,254 —5 
Costa Rica Colones 1,130 2.18 351 519 —32 
Czech Republic Koruna 52.1 2.41 i62 21.6 —25 
Denmark DKr27.75 4.84 8.62 5.74 +50 
Egypt Pound 9.09 1.60 2.82 5.70 —50 
Estonia Kroon 30 2.49 9132 12.0 —23 
Euro area’ €2.94 3.82 110s 130% +19 
Hong Kong HK$12.0 1.54 33 7.81 —52 
Hungary Forint 590 3.00 183 197 —7 
Iceland Kronur 509 7.44 158 68.4 +131 
Indonesia Rupiah 15,900 7S 4,938 9,100 —46 
Japan ¥280 2331] 87.0 121 —28 
Latvia Lats 1.35 252 0.42 0.54 —22 
Lithuania Litas 6.50 2.45 2.02 2.66 —24 
Malaysia Ringgit 5.50 S7 1.71 3.50 —51 
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Mexico Peso 29.0 2.66 9.01 10.9 —17 
New Zealand NZ$4.60 3.16 1.43 1.45 —2 
Norway Kroner 41.5 6.63 AZo 6.26 +106 
Pakistan Rupee 140 2.31 43.5 60.7 —28 
Paraguay Guarani 10,000 1.90 3,106 5,250 —41 
Peru New Sol 9.50 2.97 2.95 3.20 —8 
Philippines Peso 85.0 1.74 26.4 48.9 —46 
Poland Zloty 6.90 2.29 2.14 3.01 —29 
Russia Rouble 49.0 1.85 eee 26.5 —43 
Saudi Arabia Riyal 9.00 2.40 2.80 3:45 —25 
Singapore S$3.60 2.34 i2 1.54 —27 
Slovakia Crown 57.98 233 18.0 27.2 —34 
South Africa Rand 15.5 2.14 4.81 as —34 
South Korea Won 2,900 3.08 901 942 —4 
Sri Lanka Rupee 190 1.75 59.0 109 —46 
Sweden SKr32.0 4.59 9.94 6.97 +43 
Switzerland SFr6.30 5.05 1.96 1225 +57 
Taiwan NT$75.0 2.28 23.3 32.9 —29 
Thailand Baht 62.0 1.78 19.3 34.7 —45 
Turkey s Lire 4.55 3.22 1.41 1.41 nil 
UAE Dirhams 10.0 2.72 3.11 3.67 —15 
Ukraine Hryvnia 9.00 1.71, 2.80 5.27 —47 
Uruguay . Peso 55.0 2.17 7.1 25.3 —33 
Venezuela Bolivar 6,800 1.58 Pi Z2 4,307 —51 


*Purchasing power parity: local price divided by price in the United States. 
**Dollars per euro. 

t Average of New York, Chicago, San Francisco, and Atlanta. 

Dollars per pound. 

‘Weighted average of prices in euro area. 


Source: McDonald’s; The Economist, February 1, 2007. 


b. Do the regression results support your expectation? What formal test do you use to test your 
hypothesis? Š 

c. Should the Economist continue to publish the Big Mac Index? Why or why not? 

Refer to the SAT data given in Exercise 2.16. Suppose you want to predict the male math (Y) scores 

on the basis of the female math scores (X) by running the following regression: 


Y, = Bi + BoX; + u: 


a. Estimate the preceding model. 

b. From the estimated residuals, find out if the normality assumption can be sustained. 

c. Now test the hypothesis that 8, = 1, that is, there is a one-to-one correspondence between male and 
female math scores. 

d. Set up the ANOVA table for this problem. 

Repeat the exercise in the preceding problem but let Y and X denote the male and female critical 

reading scores, respectively. 


Sale 
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Table 5.10 gives data on the Consumer Price Index for Agricultural Laborers (CPI-AL), Industrial 
Workers (CPI-IW) and Urban Non-manual Employees (CPI-UNE), all converted to base year 1960 = 
100 along with the Wholesale Price Index (WPI) of all commodities (at 1960—61 = 100) for the period 
1970-71 to 2004—05. 


Table 5.10 CPI and WPI, India, 1970-71 to 2004-05 


Year CPI-AL CPI-IW CPI-UNE WPI Year CPI-AL CPI-IW CPI-UNE WPI 
1970-71 194 186 174 181 1988-89 708 804 728 788 
1971-72 196 192 180 188 1989-90 746 853 776 844 
1972-73 217 207 192 207 1990-91 803 951 861 931 
1973-74 263 250 221 254 1991-92 958 1080 979059 
1974-75 354 317 270 313 1992-93 1076 1183 1081 1165 
1975-76 340 313 277 303 1993-94 1114 1272 1156 1262 
1976-77 293 301 277 311 1994-95 1247 1400 1268 1399 
1977-78 324 324 296 337 1995-96 1381 1543 1386 1507 
1978-79 317 331 306 337 1996-97 1508 1686 1514 1603 
1979-80 346 360 330 394 1997-98 1555 1804 1616 1680 
1980-81 395 401 369 466 1998-99 1726 2041 1803 1795 
1981-82 444 451 413 509 1999-00 1802 2110 1883 1850 
1982-83 467 486 446 523 2000-01 1796 2189 1985 1966 
1983-84 520 547 492 572 2001-02 1820 2283 2087 2036 
1984-85 521 582 535 613 2002-03 1879 2376 2167 2106 
1985-86 546 621 572 648 2003-04 1950 2465 2247 2221 
1986-87 572 675 615 682 2004-05 2003 2564 2333 2364 
1987-88 629 735 674 734 


ee eS 
Source: Handbook of Industrial Policy and Statistics 2007-08, RBI & Labour Bureau web-site , Ministry of Labour, Government of India. 


a. Plot the CPI on the vertical axis and the WPI on the horizontal axis. A priori, what kind of 
relationship do you expect between the two indexes? Why? 

b. Suppose you want to predict one of these indexes on the basis of the other index. Which will you 
use as the regressand and which as the regressor? Why? 

c. Run the regression you have decided in (b). Show the standard output. Test the hypothesis that 
there is a one-to-one relationship between the two indexes. 

d. From the residuals obtained from the regression in (c), can you entertain the hypothesis that the 
true error term is normally distributed? Show the tests you use. 

e. Repeat (a) to (d) for the other CPI indices. What do the results indicate? 

Table 5.11 provides data on the lung cancer mortality index (100 = average) and the smoking index 

(100 = average) for 25 occupational groups. 

a. Plot the cancer mortality index against the smoking index. What general pattern do you observe? 

b. Letting Y = cancer mortality index and X = smoking index, estimate a linear regression model and 
obtain the usual regression statistics. 

c. Test the hypothesis that smoking has no influence on lung cancer at œ = 5%. 
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d. Which are the risky occupations in terms of lung cancer mortality? Can you give some reasons 
why this might be so? 
e. Is there any way to bring occupation category explicitly into the regression analysis? 


Table 5.11 Smoking and Lung Cancer 


Occupation Smoking Cancer 
Farmers, foresters, fishermen 77 í ~ 84 
Miners and quarrymen | 137 116 
Gas, coke, and chemical makers 117 1123 
Glass and ceramic makers 94 128 
Furnace forge foundry workers 116 155 
Electrical and electronic workers 102 ` 101 
Engineering and allied trades 111 118 
Wood workers : 93 i 113 
Leather workers 88 104 
Textile workers 102 88 
Clothing workers 91 104 
Food, drink, and tobacco workers 104 129 
Paper and printing workers 107 86 
Makers of other products 112 96 
Construction workers : its 144 
Painters and decorators 110 139 
Drivers of engines, cranes, etc. 125 113 
Laborers not included elsewhere 113 @ 146 
Transportation, and communication workers mS 128 
Warehousemen, store keepers, etc. ; 105 : 115 
Clerical workers 87 79 
Sales workers 91 85 
Service, sports, recreation workers as 100 120 
Administrators and managers 76 60 
Artists and professional and technical workers 66 51 


Source: http://lib.stat.cmu.edu/DASL/Datafiles/SmokingandCancer.html. 


Key to Multiple Choice Questions 


1. (c) 2. (a) b 4) Sy) 
10. (a) 11. (b) 12. (d) 13. (b+) 14) 15. (a) 16) 17.@ 18. (a) 
19. (b) 20. (a) 21. (b) 22. (b) 23. (c) 24.0) 25. () 26.) 27. &) 
28. (b) 29. (c) 30. (a) 31. (b) 32. (d) b Ss 34. (a) ~—s 35. (a) 
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Appendix 5A 


5A.I Probability Distributions Related to the Normal Distribution 


The ¢, chi-square (7), and F probability distributions, whose salient features are discussed in Appendix A, are intimately 
related to the normal distribution. Since we will make heavy use of these probability distributions in the following 
chapters, we summarize their relationship with the normal distribution in the following theorem; the proofs, which are 
beyond the scope of this book, can be found in the references.! 


Theorem 5.1. If Z. Z .... Z, are normally and independently distributed random variables such that 
Z, ~ N(,,0;), then the sum Z = \k,Z, , where k; are constants not all zero, is also distributed normally with 
mean Vk, and variance } k7o,’: that is. Z ~ N(3- kiui, S702). Note: u denotes the mean value. 


In short, linear combinations of normal variables are themselves normally distributed. For example, if Z, and Z, are 
normally and independently distributed as Z, ~ N(10, 2) and Z, ~ N(8. 1.5), then the linear combination Z = 0.8 Z, + 
0.2Z, is also normally distributed with mean = 0.8(10) + 0.2(8) = 9.6 and variance = 0.64(2) + 0.04(1.5) = 1.34, that is, 
Z.~(9.6, 1.34). 


Theorem 5.2. If Z,. Z,..... Z,, are normally distributed but are not independent, the sum Z = “k; Z, , where k; are 

constants not all zero, is also normally distributed with mean \ ku; and variance [X k?07 + 2 © kik; cov(Z;, Z;), 

ik: 

Thus, if Z, ~ N(6. 2) and Z, ~ N(7, 3) and cov(Z,, Z,) = 0.8, then the linear combination 0.6Z, + 0.4Z, is also normally 
distributed with mean = 0.6(6) + 0.4(7) = 6.4 and variance = [0.36(2) + 0.16(3) + 2(0.6)(0.4)(0.8)] = 1.584. 


Theorem 5.3. If Z,. Z,, .... Z, are normally and midenendently x distributed random variables such that each Z; ~ 
N(O, 1), that is. a sandardized norm on then >> zi =Z? + Ze + +--+ Z? follows the chi- square distri- 
bution with n df. Symbolically, >- Te ~ x7, where n INES the degrees of freedom, df. 


In short. “the sum of the squares of independent standard normal variables has a chi-square distribution with degrees 


of freedom equal to the number of terms in the sum.” 


Theorem 5.4. If Z}. Z», .... Z, are independently distributed random variables each following chi-square distri- 
bution with k; df, then the sum eZ, = Z, + Z, + - - - + Z, also follows a chi-square distribution with k = e k; df. 


Thus, if Z, and Z, are independent x’ variables with df of k, and k,, respectively, then Z=Z, + Z, is also a x° variable 
with (k, + k,) degrees of freedom. This is called the reproductive property of the x° distribution. 


Theorem 5.5. If Z, is a standardized normal variable [Z, ~ N(0, 1)] and another variable Z, follows the chi-square 
distribution with k df and is independent of Z,, then the variable defined as 


va Zi _ 4 ~k _ Standard normal variable 
~ JZ2/Vk Z2 „Independent chi-square variable/df 


follows Student’s f distribution with k df. Note: This distribution is discussed in Appendix A and is illustrated in Chapter 5. 

Incidentally, note that as k, the df, increases indefinitely (i.e., as k — œ), the Student’s r distribution approaches the 
standardized normal distribution.” As a matter of convention, the notation t, means Student’s ż distribution or variable 
with k df. 


'For proofs of the various theorems, see Alexander M. Mood, Franklin A. Graybill, and Duane C. Bose, Introduction to the 
Theory of Statistics, 3d ed., McGraw-Hill, New York, 1974, pp. 239-249. 

Ibid., p. 243. 

3For proof, see Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1978, pp. 237-245. 
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Theorem 5.6. If Z, and Z, are independently distributed chi-square variables with k, and k, df, respectively, then 
the variable 

_ Zi/kı 

Z2/k2 


has the F distribution with k, and k, degrees of freedom, where k, is known as the numerator degrees of freedom and 


k, the denominator degrees of freedom. 
Again as a matter of convention, the notation F;,, ,, means an F variable with k, and k, degrees of freedom, the df in 


the numerator being quoted first. 
In other words, Theorem 5.6 states that the F variable is simply the ratio of two independently distributed chi-square 


variables divided by their respective degrees of freedom. 


~ Fike 


Theorem 5.7. The square of (Student’s) ¢ variable with k df has an F distribution with k, = 1 df in the numerator 
and k, = k df in the denominator.‘ That is, 


Fix = ig 


Note that for this equality to hold, the numerator df of the F variable must be 1. Thus, Fi,4 = (ABOT Fiver tie and so on. 
As noted, we will see the practical utility of the preceding theorems as we progress. 


Theorem 5.8. For large denominator df, the numerator df times the F value is approximately equal to the chi-square 
value with the numerator df. Thus, 


m Fun = Xz, asn > œ 


Theorem 5.9. For sufficiently large df, the chi-square distribution can be approximated by the standard normal 
distribution as follows: 


Z = (OE — leat EN 


where k denotes df. 


5A.2 Derivation of Equation (5.3.2) 


Let 
„Êh _ Boy? . 
SED a) 
æ se (2) o 
A 
Za = (n~ 2) (2) 


Provided ø is known, Z, follows the standardized normal distribution; that is, Z, ~ N(0, 1). (Why?) Z, follows the x” 
distribution with (n -2) df.° Furthermore, it can be shown that Z, is distributed independently of Z,.° Therefore, by virtue 
of Theorem 5.5, the variable 


i ZiJn —2 
WB 
follows the ¢ distribution with n —2 df. Substitution of Eqs. (1) and (2) into Eq. (3) gives Eq. (5.3.2). 


“For proof, see Eqs. (5.3.2) and (5.9.1). 


>For proof, see Robert V. Hogg and Allen T. Craig, Introduction to Mathematical Statistics, 2d ed., Macmillan, New York 
1965, p. 144. ca 


6For proof, see J. Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, pp. 181-182. (Knowledge of matrix 
algebra is required to follow the proof.) 
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5A.3 Derivation of Equation (5.9.1) 


Equation (1) shows that Z, ~ N(0. 1). Therefore, by Theorem 5.3, the preceding quantity 
S (Ê — Br)? E x? 


Zz 
1 a 


follows the x’ distribution with 1 df. As noted in Section 5A.1, 


also follows the x? distribution with n — 2 df. Moreover, as noted in Section 4.3, Z, is distributed independently of Z}. 
Then from Theorem 5.6, it follows that 


È Ze b Ê- Bo (3?) 
Z2/(n — 2) > ûn- 2) 


follows the F distribution with 1 and n —2 df, respectively. Under the null hypothesis Hp: B, = 0, the preceding F ratio 
reduces to Eq. (5.9.1). 


5A.4 Derivations of Equations (5.10.2) and (5.10.6) 


Variance of Mean Prediction 


Given X; = Xp, the true mean prediction E(Y)|Xo) is given by 
E(Yo| Xo) = Bi + 2X0 (1) 
We estimate Eq. (1) from 
Yo = Bi + BrX0 (2) 
Taking the expectation of Eq. (2), given Xp, we get 
E(Îo) = E(B1) + E(B2)Xo 


= Bi + B2Xo 
because A; and > are unbiased estimators. Therefore, 
E(Îo) = E(Yo| Xo) = Bi + 2X0 (3) 


That is, fọ is an unbiased predictor of E(Y |X). 
Now using the property that var (a + b) = var (a) + var (b) + 2 cov (a, b), we obtain 
var (Yo) = var (Bi) + var (B2)X9 + 2 cov (BiB2)Xo (4) 
Using the formulas for variances and covariance of ĝi and f given in Egs. (3.3.1), (3.3.3), and (3.3.9) and manipulating 
terms, we obtain 
ROS >] 


a 1 
var (Yo) = 07 E + ae 


= (5.10.2) 


Variance of Individual Prediction 


We want to predict an individual Y corresponding to X = Xo; that is, we want to obtain 
Yo = Bi + B2X0 + uo (5) 
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We predict this as 
Yo = Bi + BoXo (6) 
The prediction error, Yo — Yo, is 
Yo — Yo = Bi + BoXo + uo — (fi + B2X0) 
= (Bi — Bi) + (B2 — B2)X0 + uo (7) 
Therefore, 
E(Yo — Yo) = E(Bi — Bi) + Elfa — Ê2)Xo — E(uo) 
: e 
because ĝi, Êz are unbiased, X, is a fixed number, and E(u,) is zero by assumption. 
Squaring Eq. (7) on both sides and taking expectations, we get var (Yọ — Yo) = var(B,) + je var (2) + 2X0 
cov (ß1, 62) + var (uo). Using the variance and covariance formulas for Bi and bo given earlier, and noting that var (ug) 
= 0°, we obtain 


1 oe) 


Veal l nara ee ee aire a 
var (Yo — Yo) =o | topak PE = (5.10.6) 


CHAPTER 


Extensions of the 
Two-Variable Linear 
Regression Model 


g 


Some aspects of linear regression analysis can be easily introduced within the framework of the two-variable 
linear regression model that we have been discussing so far. First we consider the case of regression through 
the origin, that is, a situation where the intercept term, B}, is absent from the model. Then we consider the 
questior of the units of measurement, that is, how the Y and X variables are measured and whether a change 
in the units of measurement affects the regression results. Finally, we consider the question of the functional 
form of the linear regression model. So far we have considered models that are linear in the parameters as 
well as in the variables. But recall that the regression theory developed in the previous chapters requires 
only that the parameters be linear; the variables may or may not enter linearly in the model. By considering 
models that are linear in the parameters but not necessarily in the variables, we show in this chapter how the 
two- variable models can deal with some interesting practical problems. 

Once the ideas introduced in this chapter are grasped, their extension to multiple regression models is 
quite straightforward, as we shall show in Chapters 7 and 8. 


6.1 Regression through the Origin 


There are occasions when the two-variable population regression function (PRF) assumes the following form: 


Y; = B2Xi +u; (6.1.1) 


In this model the intercept term is absent or zero, hence the name regression through the origin. 
As an illustration, consider the capital asset pricing model (CAPM) of modern portfolio theory, which, in 
its risk-premium form, may be expressed as! 


(ER; — rf) = Bi\(ERn — rp) (6.1.2) 


'See Haim Levy and Marshall Sarnat, Portfolio and Investment Selection: Theory and Practice, Prentice-Hall International, 
Englewood Cliffs, NJ, 1984, Chap. 14. 
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where ER, = expected rate of return on security i 
ER, = expected rate of return on the market portfolio as represented by, say, the S&P 500 composite 
stock index 
r= risk-free rate of return, say, the return on 90-day Treasury bills 
B; = the Beta coefficient, a measure of systematic risk, 1.e., risk that cannot be eliminated through 
diversification. Also, a measure of the extent to which the ith security’s rate of return moves 
with the market. A B; > 1 implies a volatile or aggressive security, whereas a B; < 1 suggests a 
defensive security. (Note: Do not confuse this B; with the slope coefficient of the two-variable 
regression, 5.) ` 
If capital markets work efficiently, then CAPM postulates that security i’s expected risk premium 
(= ER, — rp is equal to that security’s B coefficient times the expected market risk premium (= ER,, — rp). If 
the CAPM holds, we have the situation depicted in Figure 6.1. The line shown in the figure is known as the 
security market line (SML). 


ERAT 


Security market line 


Figure 6.1 Systematic risk. 


For empirical purposes, Equation 6.1.2 is often expressed as 


Ri —rf = Bi Rm —rp) tui (6.1.3) 
or 


Ri —re =a; + Bi Rm — rp) +43 (6.1.4) 


The latter model is known as the Market Model.” If CAPM holds, a; is expected to be zero. (See Figure 6.2.) 

In passing, note that in Equation 6.1.4 the dependent variable, Y, is (R, — r) and the explanatory variable, 
X, is B; the volatility coefficient, and not (Rm — rp). Therefore, to run regression Eq. (6.1.4), one must first 
estimate B,, which is usually derived from the characteristic line, as described in Exercise 5.5. (For further 
details, see Exercise 8.28.) 

As this example shows, sometimes the underlying theory dictates that the intercept term be absent from the 
model. Other instances where the zero-intercept model may be appropriate are Milton Friedman’s permanent 
income hypothesis, which states that permanent consumption is proportional to permanent income; cost 


2See, for instance, Diana R. Harrington, Modern Portfolio Theory and the Capital Asset Pricing Model: A User’s Guide, Prentice 
Hall, Englewood Cliffs, NJ, 1983, p. 71. 
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Rin; 


o Security risk premium 


Bi 


Systematic risk 


Figure 6.2 The market model of portfolio theory (assuming a; = 0). 


analysis theory, where it is postulated that the variable cost of production is proportional to output; and some 
versions of monetarist theory that state that the rate of change of prices (i.e., the rate of inflation) is propor- 
tional to the rate of change of the money supply. 

How do we estimate models like Eq. (6.1.1), and what special problems do they pose? To answer these 
questions, let us first write the sample regression function (SRF) of Eq. (6.1.1), namely, 


Y; = BX; +i; (6.1.5) 


Now applying the ordinary least squares (OLS) method to Eg. (6.1.5), we obtain the following formulas 
for fz and its variance (proofs are given in Appendix 6A, Section 6A.1): 


A Peradi 
j= yx (6.1.6) 


T Ge 
var (B2) = wea : (6.1.7) 


2 


where o~ is estimated by 


~2 
ô? = a (6.1.8) 


It is interesting to compare these formulas with those obtained when the intercept term is included in the 
model: 


E = (3.1.6) 

n 2 
ae D (3.3.1) 
gu a (3.3.5) 
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The differences between the two sets of formulas should be obvious: In the model with the intercept term 
absent, we use raw sums of squares and cross products but in the intercept-present model, we use adjusted 
(from mean) sums of squares and cross products. Second, the df for computing G? is (n — 1) in the first case 
and (n — 2) in the second case. (Why?) 

Although the interceptless or zero intercept model may be appropriate on occasions, there are some 
features of this model that need to be noted. First, }_ #;, which is always zero for the model with the intercept 
term (the conventional model), need not be zero when that term is absent. In short, >. ù; need not be zero 
for the regression through the origin. Second, r°, the coefficient of determination introduced in Chapter 3, 
which is always non-negative forthe conventional model, can on occasions turn out to be negative for the 
interceptless model! This anomalous result arises because the r introduced in Chapter 3 explicitly assumes 
that the intercept is included in the model. Therefore, the conventionally computed r may not be appropriate 
for regression-through-the-origin models.” 


r for Regression-through-Origin Model 


As just noted, and as further discussed in Appendix 6A, Section 6A. 1, the conventional r given in Chapter 
3 is not appropriate for regressions that do not contain the intercept. But one can compute what is known as 
the raw 7% for such models, which is defined as 


saw 7? = LOX) 6.1.9 
awe (pases: 


Note: These are raw (i.e., not mean-corrected) sums of squares and cross products. 

Although this raw 7° satisfies the relation 0 < 7? < 1, it is not directly comparable to the conventional r 
value. For this reason some authors do not report the 7° value for zero intercept regression models. 

Because of these special features of this model, one needs to exercise great caution in using the zero 
intercept regression model. Unless there is very strong a priori expectation, one would be well advised to 
stick to the conventional, intercept-present model. This has a dual advantage. First, if the intercept term is 
included in the model but it turns out to be statistically insignificant (1.e., statistically equal to zero), for all 
practical purposes we have a regression through the origin. Second, and more important, if in fact there is 
an intercept in the model but we insist on fitting a regression through the origin. we would be committing a 
specification error. We will discuss this more in Chapter 7. x 


‘Example 6.1 


Table 6.1 gives data on excess returns Y, (%) on an index of 104 stocks in the sector of cyclical consumer 
goods and excess returns X, (%) on the overall stock market index for the U.K. for the monthly data for the 
period 1980-1999, for a total of 240 observations.® Excess return refers to return in excess of return on a 
riskless asset (see the CAPM model). i 


3For additional discussion, see Dennis J. Aigner, Basic Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1971, pp. 85-88. 


‘Henri Theil points out that if the intercept is in fact absent, the slope coefficient may be estimated with far greater precision 
than with the intercept term left in. See his Introduction to Econometrics, Prentice Hall, Englewood Cliffs, Nj, 1978, p. 76. 
See also the numerical example given next. 


‘These data, originally obtained from DataStream databank, are reproduced from Christiaan Heij et al., Econometrics Meth- 
ods with Applications in Business and Economics, Oxford University Press, Oxford, U.K., 2004. 


Table 6.1 


OBS 


1980:01 
1980:02 
1980:03 
1980:04 
1980:05 
1980:06 
1980:07 
1980:08 
1980:09 
1980:10 
1980:11 
1980:12 
1981:01 
1981:02 
1981:03 
1981:04 
1981:05 
1981:06 
1981:07 
1981:08 
1981:09 
1981:10 
1981:11 
1981:12 
1982:01 
1982:02 
1982:03 
1982:04 
1982:05 
1982:06 
1982:07 
1982:08 
1982:09 
1982:10 
1982:11 
1982:12 
1983:01 
1983:02 
1983:03 
1983:04 
1983:05 
1983:06 
1983:07 
1983:08 
1983:09 
1983:10 
1983:11 
1983:12 
1984:01 
1984:02 
1984:03 
1984:04 
1984:05 
1984:06 
1984:07 
1984:08 
1984:09 
1984:10 
1984:11 


6.08022852 
—0.924185461 
—3.286174252 

5.211976571 

—16.16421111 
—1.054703649 
11.17237699 

—11.06327551 

—16.77699609 
—7.021834032 
—9.71684668 

5.215705717 
—6.612000956 

4.264498443 

4.916710821 
22.20495946 

—11.29868524 
—5.770507783 
—5.217764717 
16.19620175 
—17.16995395 
1.105334728 
11.6853367 
—2.301451728 
8.643728679 
—11.12907503 

1.724627956 

0.157879967 
—1.875202616 

~—10.62481767 
—5.761135416 
5.481432596 
—17.02207459 
7.625420708 
—6.575721646 
—2.372829861 
17.52374936 
1.354655809 
16.26861049 
—6.074547158 
—0.826650702 
3.807881 996 
0.57570091 
3.755563441 
— 53365927271 
—3.750302815 
4.898751703 
4.379256151 
16.56016188 
1.523127464 
1.0206078 
—3.899307684 
14.32501615 
3.056627177 
—0.02153592 
3.355102212 
0.100006778 
1.691250318 
8.20075301 


X 


7.263448404 
6.339895504 
—9.285216834 
0.793290771 
—2.902420985 
8.613150875 
3,982062848 
—1.150170907 
3.486125868 
4.329850278 
0.936875279 
—5.202455846 
—2.082757509 
2.728522893 
0.653397106 
6.436071962 
—4.259197932 
0.543909707 
—0.486845933 
2.843999508 
—16.4572142 
4.468938171 
5.885519658 
—0.390698164 
2.499567896 
—4,033607075 
3.042525777 
0.734564665 
2.779732288 
—5.900116576 
3.005344385 
3.954990619 
2.547127067 
4.329008106 
0.191940594 
—0.92167555 
3.394682577 
0.758714353 
1.862073664 
6.797751341 
—1.699253628 
4.092592402 
—2.926299262 
1.773424306 
—2.800815667 
—1.505394995 
4.18696284 
1.201416981 
6.769320788 
—1.686027417 
5.245806105 
1.728710264 
—7.279075595 
—0.77947067 
—2.439634487 
8.445977813 
1.221080129 
2.733386772 
5.12753329 
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OBS 


1984:12 
1985:01 
1985:02 
1985:03 
1985:04 
1985:05 
1985:06 
1985:07 
1985:08 
1985:09 
1985:10 
1985:11 
1985:12 
1986:01 
1986:02 
1986:03 
1986:04 
1986:05 
1986:06 
1986:07 
1986:08 
1986:09 
1986:10 
1986:11 
1986:12 
1987:01 
1987:02 
1987:03 
1987:04 
1987:05 
1987:06 
1987:07 
1987:08 
1987:09 
1987:10 
1987:11 
1987:12 
1988:01 
1988:02 
1988:03 
1988:04 
1988:05 
1988:06 
1988:07 
1988:08 
1988:09 
1988:10 
1988:11 
1988:12 
1989:01 
1989:02 
1989:03 
1989:04 
1989:05 
1989:06 
1989:07 
1989:08 
1989:09 
1989:10 


Y 


3.52786616 
4.554587707 
5.365478677 
4.525231564 
2.944654344 
—0.268599528 
—3.661040481 
—4.540505062 
9.195292816 
~—1.894817019 
12.00661274 
1.233987382 
~1.446329607 
6.023618851 
10.51235756 
13.40071024 
—7.796262998 
0.211540446 
6.471111064 
—9.037475168 
—5.47838091 
—6.756881852 
—2.564960223 
2.456599468 
1.476421303 
17.0694004 
7565726727 
3239325817 
3.662578335 
7.157455113 
4.774901623 
4.23770166 
—0.881352219 
11.49688416 
—35.56617624 
14.59137369 
14.87271664 
1.748599294 
~0.606016446 
—6.078095523 
3.976153828 
—1.050910058 
3.317856956 
0.407100105 
—11.87932524 
—8.801026046 
6.784211277 
—10.20578119 
—6.73805381 
12.83903643 
3.302860922 
—0.155918301 
3.623090767 
—1.167680873 
—1.221603303 
5.262902744 
4.845013219 
—5.069564838 
—13.57963526 


X 


3.191554763 
3.907838688 
—1.708567484 
0.435218492 
0.958067845 
1.095477375 
—6.816108909 
2.785054354 
3.900209023 
—4.203004414 
5.60179802 
1.570093976 
—1.084427121 
0.778669473 
6.470651262 
8.953781192 
—2.387761685 
—2.873838588 
3.440269098 
—5.891053375 
6.375582004 
—5.734839396 
3.63088408 
—1.31606687 
3.521601216 
8.673412896 
6.914361923 
—0.460660854 
4.295976077 
7.719692529 
3.039887622 
2.510223804 
—3.039443563 
3.787092018 
—27.86969311 
—9.956367094 
7.975865948 
3.936938398 
—0.32797064 
—2.161544202 
2.721787842 
—0.514825422 
3.128796482 
0.181502075 
—7.892363786 
3.347081899 
3.158592144 
—4.816470363 
—0.008549997 
13.46098219 
—0.764474692 
2.298491097 
0.762074588 
—0.495796117 
1.206636013 
4.637026116 
2.680874116 
—5,303858035 
—7.210655599 


(Contd.) 


163 


164 Basic Econometrics 


(Contd.) 
OBS 


1989:11 
1989:12 
1990:01 
1990:02 
1990:03 
1990:04 
1990:05 
1990:06 
1990:07 
1990:08 
1990:09 
1990:10 
1990:11 
1990:12 
1991:01 
1991:02 
1991:03 
1991:04 
1991:05 
1991:06 
1991:07 
1991:08 
1991:09 
1991:10 
1991:11 
1991:12 
1992:01 
1992:02 
1992:03 
1992:04 
1992:05 
1992:06 
1992:07 
1992:08 
1992:09 
1992:10 
1992:11 

1992:12 
1993:01 

1993:02 
1993:03 
1993:04 
1993:05 
1993:06 
1993:07 
1993:08 
1993:09 
1993:10 
1993:11 

1993:12 
1994:01 

1994:02 
1994:03 
1994:04 
1994:05 
1994:06 
1994:07 
1994:08 
1994:09 
1994:10 
1994:11 


ee M 


vi 


1.100607603 
4.925083189 
—2.532068851 
—6.601872876 
—1.023768943 
—7.097917266 
6.376626925 
1.861974711 
S55 91527585 
—15.31758975 
—10.17227358 
—2.217396045 
5.974205798 
—0.857289036 
—3.780184589 
20.64721437 
10.94068018 
—3.145639589 
—3.142887645 
—1.960866141 
7.330964031 
7.854387926 
2.539177843 
—1.233244642 
—11.7460404 
1.078226286 
5.937904622 
4.113184542 
—0.655199392 
15.28430278 
3.994517585 
—11.94450998 
~—2.530701327 
—9.842366221 
18.11573724 
0.200950206 
1.125853097 
7.639180786 
2.919569408 
—1.062404105 
1.292641409 
0.420241 384 
—2.514080553 
0.419362276 
4.374024535 
1.733528075 
—3.659808969 
5.85690764 
—1.365550294 
—1.346979017 
12.89578758 
—5.346700561 
—7.614726564 
10.22042923 
—6.928422261 
—5.065919037 
7483498556 
1.828762662 
~5.69293279 
—2.426962489 
2.125100668 


X 
5.350185944 
4.106245855 

—3.629547374 

—5.205804299 

—2.183244863 

—5.408563794 

10.57599169 

—0.338612099 

—2.21316202 

—8.476177427 

—7.45941471 

—0.085887763 
5.034770534 

—1.767714908 
0.189108456 

10.38741504 
2.921913827 
0.971720188 

—0.4317819 

—3.342924986 
5.242811509 
2.880654691 

—1.121472224 

—3.969577956 

—5.707995062 
1.502567049 
2.599565094 
0.135881087 

—6.146138064 

10.45736831 
1.415987046 

—8.261109424 

—3.778812167 

—5.386818488 

11.19436372 
3.999870038 
3.620674752 
2.887222251 
1.336746091 
1.240273846 
0.407144312 

—1.734930047 
1.111533687 
1.354127742 
1.943061568 
4.961979827 

—1.618729936 
4.215408608 
1.880360165 
5.826352413 
2.973540693 

—5.479858563 

—5.784547088 
1.157083438 

—6.356199493 

—0.843583888 
5.779953224 
3.298130184 

—7.110010085 
2.968005597 

—1.531245158 


OBS 


1994:12 
1995:01 
1995:02 
1995:03 
1995:04 
1995:05 
1995:06 
1995:07 
1995:08 
1995:09 
1995:10 
1995:11 
1995:12 
1996:01 
1996:02 
1996:03 
1996:04 
1996:05 
1996:06 
1996:07 
1996:08 
1996:09 
1996:10 
1996:11 
199612 
1997:01 
1997:02 
1997:03 
1997:04 
1997:05 
1997:06 
1997:07 
1997:08 
1997:09 
1997:10 
1997:11 
1997:12 
1998:01 
1998:02 
1998:03 
1998:04 
1998:05 
1998:06 
1998:07 
1998:08 
1998:09 
1998:10 
1998:11 
1998:12 
1999:01 
1999:02 
1999:03 
1999:04 
1999:05 
1999:06 
1999:07 
1999:08 
1999:09 
1999:10 
1999:11 
NIIIN 


—4.225370964 
—6.302392617 
1.27867637 
10.90890516 

2.497849434 
2.891526594 
—3.773000069 
8.776288715 
2.88256097 
2.14691333 
—4.590104662 
—1.293255187 
—4.244101531 
6.647088904 
1.635900742 
7.8581899 
0.789544896 
—0.907725397 
—0.392246948 
—1.035896351 
2.556816005 
3.131830038 
—0.020947358 
—5.312287782 
—5.196176326 
—0.753247124 
~2.474343938 
2.47647802 
-1.119104196 
3.352076269 
—1.910172239 
0.142814607 
10.50199263 
12.98501943 
—4.134761655 
—4.148579856 
—1.752478236 
~3.349121498 
14.07471304 
7.791650968 
5.154679109 
3.293686179 
13.25461802 
—7.714205916 
15.26340483 
15.22865141 
15.96218038 
—8.684089113 
17.13842369 
—1,468448611 
8.5036 
10.8943073 
13.03497394 
~5.654671597 
8.321969316 
0.507652273 
—5.022980561 
—2.305448839 
-1.876879466 
1.348824769 
—2.64164938 


X 
0.264280259 
—2.420388431 
0.138795213 
3.231656585 


2.215804682 
3.856813589 


` +0.952204306 


4.020036363 
1.423600345 
—0.037912571 
—1.17655329 
3.760277356 
0.434626357 
1.906345103 
0.301898961 
—0.314132324 
3.034331741 
—1.497346299 
—0.894676854 
—0.532816274 
3.863737088 
2.118254897 
—0.853553262 
1.770340939 
1.702551635 
3.465753348 
1.115253221 
—2.057818461 
3.57089955 
1.953480438 
2.458700404 
2.992341297 
—0.457968038 
8.111278967 
—6.967124504 
—0.155924791 
3.853283433 
7.379466014 
4299097886 
3.410780517 
—0.081494993 
—1.613131159 
—0.397288954 
—2.237365283 
—12.4631993 
—5.170734985 
11.70544788 
~—0.380200223 
4.986705187 
2.493727994 
0.937105259 
4.280082506 
3.960824402 
—4.499198079 
3.656745699 
—2.503971473 
—0.121901923 
—5.388032432 
4.010989716 
6.265312975 
4.045658427 


Extensions of the Two-Variable Linear Regression Model 165 


~~. PE 


Example 6.1 


First we fit model (6.1.3) to these data. Using EViews6 we obtained the following regression results, which are 
given in the standard E Views format. 


Dependent Variable: Y 
Method: Least Squares 
Sample: 1980M01 1999M12 
Included observations: 240 


Coefficient Std. Error t-Statistic Prob. 


— X —_ 1.155512 0.074396 15.53200 0.0000 
R-squared 0.500309 Mean dependent var. 0.499826 
Adjusted R-squared' 0.500309 S.D. dependent var. 7.849594 
S.E. of regression 5.548786 | Durbin-Watson stat.* 1.972853 


Sum squared resid. 7358.578 


*We will discuss this statistic in Chapter 12. 
‘See Chapter 7. 


As these results show, the slope coefficient, which is the Beta coefficient, is highly significant, for its p value is 
extremely small. The interpretation here is that if the excess market rate goes up by 1 percentage point, the 
excess return on the index of consumer goods sector goes up by about 1.15 percentage points. Not only is 
the slope coefficient statistically significant, but it is significantly greater than 1 (can you verify this?). If a Beta 
coefficient is greater than 1, such a security (here a portifolio of 104 stocks) is said to be volatile; it moves more 
than proportionately with the overall stock market index. But this finding should not be surprising, for in this 
example we are considering stocks from the sector of cyclical consumer goods such as houshold durables, 
automobiles, textiles, and sports equipment. . 
If we fit model (6.1.4), we obtain the following results: 


Dependent Variable: Y 
Method: Least Squares 
Sample: 1980M01 1999M12 
Included observations: 240 


EE = Coefficient Std. Error t-Statistic Prob. 

C —0.447481 0.362943 —1.232924 0.2188 

X 1.171128 0.075386 _ 15.53500 0.0000 © 
R-squared 0.503480 Mean dependent var. 0.499826 
Adjusted R-squared 0.501394 S.D. dependent var. 7.849594 
S.E. of regression 5.542759 Durbin-Watson stat. 1.984746 
Sum squared resid. 7311.877 Prob. (F-statistic) l 0.000000 


F-statistic 241.3363 


From these results we see that the intercept is not statistically different from zero, although the slope coeffi- 
cient (the Beta coefficient) is highly statistically significant. This suggests that the regression-through-the-origin 
model fits the data well. Besides, statistically there is no difference in the value of the slope coefficient in the 
two models. Note that the standard error of the slope coefficient in the regression-through-the-origin model is 
slightly lower than the one in the intercept-present model, thus supporting Theil’s argument given in footnote 
4. Even then, the slope coefficient is statistically greater than 1, once again confirming that returns on the 
stocks in the cyclical consumer goods sector are volatile. 
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By the way, note that the 7° value given for the regression-through-the-origin model should be taken with 
a grain of salt, for the traditional formula of r° is not applicable for such models. EViews, however, routinely 
presents the standard 7? value even for such models. 


6.2 Scaling and Units of Measurement 


To grasp the ideas developed in this section, consider the data given in Table 6.2, which refers to Indian 
gross domestic savings (GDS) and gross domestic product (GDP), in rupee crore as well as in rupees lakh 
crore measured in 1999-2000 prices. 

Suppose in the regression of GDS on GDP one researcher uses data in rupee crore but another expresses 
data in rupee lakh crore. Will the regression results be the same in both cases? If not, which results should one 
use? In short, do the units in which the regressand and regressor(s) are measured make any difference in the 
regression results? If so, what is the sensible course to follow in choosing units of measurement for regression 
analysis? To answer these questions, let us proceed systematically. Let 


Y; = Bi + ÊX; + ûi (6.2.1) 

where Y = GDS and X = GDP. Define 
¥* = wr; (6.2.2) 
Xj = wi i (6.2.3) 


where w, and w, are constants, called the scale factors; w, may equal w, or be different. 

From Equations 6.2.2 and 6.2.3 it is clear that Y* and X* are rescaled Y; and X;. Thus, if Y. and X, are 
measured in rupee crore and one wants to express them in rupee lakh crore, we will have Y* = 100Y, and 
X7 = 100 X;; here w, = w= 100. 

Now consider the regression using Y;* and X; variables: 


Y= pio + at (6.2.4) 
where Y;* = w1 Y;, Xý = w2X;, and ù* = witt;. (Why?) 


Table 6.2 Gross domestic savings and GDP for India, 1951—52 to 2004—05, both at 1999-2000 prices 
a a 


Year GDS (in Rs. Crore) GDP (in Rs. Crore) GDS (in Rs. Lakh) GDP (in Rs. Lakh) 
1951-52 969 230,034 96,900 23,003,400 
1952-53 845 236,562 84,500 23,656,200 
1953-54 875 250,960 —- 87,500 25,096,000 
1954-55 988 261,615 98,800 26,161,500 
1955-56 1,356 268,316 135,600 l 26,831,600 
1956-57 1,561 : 283,589 156,100 28,358,900 
1957-58 1,356 280,160 135,600 28,016,000 
1958-59 1,379 301,422 137,900 30,142,200 
1959-60 1,720 308,018 172,000 30,801,800 
1960-61 1,952 329,825 195,200 32,982,500 
1961-62 2,074 340,060 207,400 34,006,000 
1962-63 2,440 347,253 244,000 34,725,300 


$a FO 84 000 34,725,300 
(Contd.) 
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Source: Handbook of Statistics on Indian Economy, 2009-10, RBI, Mumbai 


(Contd.) 
1963-64 2,703 364,834 270,300 36,483,400 
1964-65 3,077 392,503 307,700 39,250,300 
1965-66 3,833 378,157 383,300 37,815,700 
1966-67 4,393 382,006 439,300 38,200,600 
1967-68 4,293 413,094 429,300 41,309,400 
1968-69 4,657 423,874 465,700 42,387,400 
1969-70 6,044 451,496 604,400 45,149,600 
1970-71 6,571 474,131 657,100 47,413,100 
1971-72 7,281 478,918 728,100 47,891,800 
1972-73 7,788 477,392 778,800 47,739,200 
1973-74 10,912 499,120 1,091,200 49,912,000 
1974-75 12,298 504,914 1,229,800 50,491,400 
1975-76 14,196 550,379 1,419,600 55,037,900 
1976-77 17,320 557,258 1,732,000 55,725,800 
1977-78 19,995 598,885 1,999,500 59,888,500 
1978-79 23,601 631,839 2,360,100 63,183,900 
1979-80 24,213 598,974 2,421,300 59,897,400 
1980-81 26,881 641,921 2,688,100 64,192,100 
1981-82 30,896 678,033 3,089,600 67,803,300 
1982-83 33,787 697,861 3,378,700 69,786,100 
1983-84 38,091 752,669 3,809,100 75,266,900 
1984-85 45,453 782,484 4,545,300 78,248,400 
1985-86 53,389 815,049 5,338,900 81,504,900 
1986-87 58,036 850,217 5,803,600 85,021,700 
1987-88 72,264 880,267 7,226,400 88,026,700 
1988-89 87,166 969,702 8,716,600 96,970,200 
1989-90 106,092 1,029,178 10,609,200 102,917,800 
1990-91 130,010 1,083,572 13,001,000 108,357,200 
1991-92 141,089 1,099,072 14,108,900 109,907,200 
1992-93 159,682 1,158,025 15,968,200 115,802,500 
1993-94 189,933 1,223,816 18,993,300 122,381,600 
1994-95 247,462 1,302,076 24,746,200 130,207,600 
1995-96 291,002 1,396,974 29,100,200 139,697,400 
1996-97 313,068 1,508,378 31,306,800 150,837,800 
1997-98 363,506 1,573,263 36,350,600 157,326,300 
1998-99 389,747 1,678,410 38,974,700 167,841,000 
1999-00 484,256 1,786,526 48,425,600 178,652,600 
2000-01 499,033 1,864,773 49,903,300 186,477,300 
2001-02 534,885 1,972,912 53,488,500 197,291,200 
2002-03 646,521 2,047,733 64,652,100 204,773,300 
2003-04 820,685 2222591 82,068,500 222,259,100 
2004-05 997,873 2,389,660 99,787,300 238,966,000 
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We want to find out the relationships between the following pairs: 


a Bi and Br 

: Êz and B; . 

. var (ĝi) and var (pÝ) 
. var (Bo) and var (BS) 
. ô? andô”? 

r andi 


Eo on 


From least-squares theory we know (see Chapter 3) that 


ONE a ÊX (6.2.5) 
Ê n D; 
2 = Tx? (6.2.6) 
A N ee 
var (61) = P = (6.2.7) 
oo 
var (2) = oe (6.2.8) 
airs (6.2.9) 
Applying the OLS method to Equation 6.2.4, we obtain similarly 
pr= Y= BX ` (6.2.10) 
Bt D D Xi Vj 
= S? (6.2.11) 
> a *2 
var (B*) = = j- ya Kog (6.2.12) 
s *2 SA 
var (B;) = SS (6.2.13) 
a «2 = pe ae 
n (6.2.14) 


From these results it is easy to establish relationships between the two sets of parameter estimates. AH that 
one has to do is recall these definitional relationships: Y* = wı Y, (or yë = wiv); X* = wX; (or x* = wx, ): 


U Sea; Y* = wi; and X* = wX. Making use of these definitions, the reader can easily verify that 


a Wi A i 

al 3 Bo (6.2.15) 
Bx = wii l (6.2.16) 
wD a2: 

Cg Nie (6.2.17) 


var (Bt) = wi var (Âi) (6.2.18) 
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A W] 2 A 
var (83) = (=) var (>) (6.2.19) 
2 
eee (6.2.20) 


From the preceding results it should be clear that, given the regression results based on one scale of 
measurement, one can derive the results based on another scale of measurement once the scaling factors, the 
w’s, are known. In practice, though, one should choose the units of measurement sensibly; there is little point 
in carrying all those zeros in expressing numbers in lakh or crores of rupees. 

From the results given in (6.2.15) through (6.2.20) one can easily derive some special cases. For instance, if 
w; = w,, that is, the scaling factors are identical, the slope coefficient and its standard error remain unaffected 
in going from the (¥;, X,) to the (Y,*, X*) scale, which should be intuitively clear. However, the intercept and 
its standard error are both multiplied by w,. But if the X scale is not changed (i.e., w, = 1) and the Y scale is 
changed by the factor w}, the slope as well as the intercept coefficients and their respective standard errors are 
all multiplied by the same w, factor. Finally, if the Y scale remains unchanged (i.e., w, = 1) but the X scale is 
changed by the factor w,, the slope coefficient and its standard error are multiplied by the factor (1 /w ) but 
the intercept coefficient and its standard error remain unaffected. 

It should, however, be noted that the transformation from the (Y, X) to the (Y’, X“) scale does not affect the 
properties of the OLS estimators discussed in the preceding chapters. 


Example 6.2 The relationship between GDS and GDP, India, 1951-52 to 2004-05 


To substantiate the preceding theoretical results, let us return to the data given in Table 6.2 and examine the 
following results (numbers in parentheses are the estimated standard errors). 
Both GDS and GDP in rupee crore: 
GDS, = 167423.37 + 0.36 GDP, 
se = (17721.01) (0.02). r =0.8891 i (6.2.21) 
Both GDS and GDP in rupee lakh: 
GDS, =-16742336.51 + 0.36 GDP, 
se = (1772100.74) (0.02) fr=0.8891 (6.2.22) 
Notice that the intercept and its standard error is 100 times the corresponding values in the regression 
(6.2.21) (note that w, = 100 is going from crore to lakhs of rupees), but the slope coefficient as well as its 
standard error is unchanged, in accordance with the theory. 
GDS in rupee crore and GDP in rupee lakh: 
GDS, =-167423.37 + 0.0036 GDP, 
se = (17721.01) (0.0002) r = 0.8891 (6.2.23) 
As expected, the slope coefficient as well as its standard error is 1/100 its value in Eq. (6.2.12), since only 
X, or GDP, scale is changed. 
GDS in rupee lakh and GDP in rupee crore: 
GDS, = -16742336.51 + 36.33 GDP, 
se = (1772100.74) (1.78) r = 0.8891 (6.2.24) 
Again note that both the intercept and the slope coefficients as well as their respective standard errors are 
100 times their values in Eq. (6.2.21), in accordance with our theoretical results. 
Notice that in all the regressions presented above, the r value remains the same, which is not surprising 


because the r? value is invariant to changes in the unit of measurement as it is pure, or dimensionless, number. 
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A Word about Interpretation 


Since the slope coefficient B, is simply the rate of change, it is measured in the units of the ratio 


Units of the dependent variable 
Units of the explanatory variable 


Thus in regression (6.2.21) the interpretation of the slope coefficient 0.36 is that if GDP changes by a unit, 
which is 1 crore rupee, GDS on the average changes by 0.36 crore rupee. In regression (6.2.23) a unit change 
in GDP, which is 1 lakh rupee, leads on average to a 0.0036 crore rupee change in GDS. The two results are of 
course identical in the effects of GDP on GPDI; they are simply expressed in different units of measurement. 


6.3 Regression on Standardized Variables 


We saw in the previous section that the units in which the regressand and regressor(s) are expressed affect the 

interpretation of the regression coefficients. This can be avoided if we are willing to express the regressand 

and regressor(s) as standardized variables. A variable is said to be standardized if we subtract the mean value 

of the variable from its individual values and divide the difference by the standard deviation of that variable. 
Thus, in the regression of Y and X, if we redefine these variables as 


ai 
saa (6.3.1) 


aac. (6.3.2) 
where Y = sample mean of Y, S= sample standard deviation of Y, ¥ = sample mean of X, and Sy is the sample 
standard deviation of X; the variables Y* and X* are called standardized variables. 

An interesting property of a standardized variable is that its mean value is always zero and its standard 
deviation is always 1. (For proof, see Appendix 6A, Section 6A.2.) 

As aresult, it does not matter in what unit the regressand and regressor(s) are measured. Therefore, instead 
of running the standard (bivariate) regression: 


wv 


Y; = Bi + bX; +u; (6.3.3) 

we could run regression on the standardized variables as 
Yi = Bi + BXG +; (6.3.4) 
= By XP +u (6.3.5) 


since it is easy to show that, in the regression involving standardized regressand and regressor(s), the intercept 
term is always zero.° The regression coefficients of the standardized variables, denoted by fy and Bj, are 
known in the literature as the beta coefficients.’ Incidentally, notice that (6.3.5) is a regression through the 
origin. 


Recall from Eq. (3.1.7) that Intercept = Mean value of the dependent variable — Slope x Mean value of the regressor. But 
for the standardized variables the mean values of the dependent variable and the regressor are zero. Hence the intercept 
value is zero. 


’Do not confuse these beta coefficients with the beta coefficients of finance theory. 
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How do we interpret the beta coefficients? The interpretation is that if the (standardized) regressor increases 
by one standard deviation, on average, the (standardized) regressand increases by 63 standard deviation units. 
Thus, unlike the traditional mode! in Eq. (6.3.3), we measure the effect not in terms of the original units in 
which Y and X are expressed, but in standard deviation units. 

To show the difference between Eqs. (6.3.3) and (6.3.5), let us return to the GDS and GDP example 
discussed in the preceding section. The results of (6.2.21) discussed previously are reproduced here for 
convenience. 


GDS, = -167423.37 + 0.36 GDP, (6.3.6) 


se =(177721.01) (0.02) = =0.8891 
where both GDS and GDP are measured in rupee crore 
The results corresponding to Eq. (6.3.5) are as follows, where the starred variables are standardized 
variables: 


GDS; = 0.94 GDP,” 
se = (0.05) (6.3.7) 


We know how to interpret Eq. (6.3.6): If GDP goes up by a rupee, on average GDS goes up by about 
36 paisa. How about Eq. (6.3.7)? Here the interpretation is that if the (standardized) GDP increases by one 
standard deviation, on average, the (standardized) GDS increases by about 0.05 standard deviations. 

What is the advantage of the standardized regression model over the traditional model? The advantage 
becomes more apparent if there is more than one regressor, a topic we will take up in Chapter 7. By standard- 
izing all regressors, we put them on an equal basis and therefore can compare them directly. If the coefficient 
of a standardized regressor is larger than that of another standardized regressor appearing in that model, then 
the latter contributes more relatively to the explanation of the regress and than the former. In other words, we 
can use the beta coefficients as a measure of relative strength of the various regressors. But more on this in 
the next two chapters. 

Before we leave this topic. two points may be noted. First, for the standardized regression 1n Eq. (6.3.7) we 
have not given the r value because this is a regression through the origin for which the usual 7^ is not appli- 
cable, as pointed out in Section 6.1. Second, there is an interesting relationship between the 6 coefficients of 
the conventional model and the beta coefficients. For the bivariate case, the relationship is as follows: 


Ê; = bp ( =) (6.3.8) 
y 
where S, = the sample standard deviation of the X regressor and S, = the sample standard deviation of the 
regressand. Therefore, we can crisscross between the B and beta coefficients if we know the (sample) standard 
deviation of the regressor and regressand. We will see in the next chapter that this relationship holds true in 
the multiple regression also. It is left as an exercise for the reader to verify Eq. (6.3.8) for our illustrative 
example. 


6.4 Functional Forms of Regression Models 


As noted in Chapter 2, this text is concerned primarily with models that are linear in the parameters; they may 
or may not be linear in the variables. In the sections that follow we consider some commonly used regression 
models that may be nonlinear in the variables but are linear in the parameters or that can be made so by 
suitable transformations of the variables. In particular, we discuss the following regression models: 

1. The log-linear model 
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2. Semilog models 

3. Reciprocal models 

4. The logarithmic reciprocal model 

We discuss the special features of each model, when they are appropriate, and how they are estimated. 
Each model is illustrated with suitable examples. 


6.5 How to Measure Elasticity: The Log-Linear Model 


Consider the following model, known as the exponential regression model: 
Y, = pX e (6.5.1) 
which may be expressed alternatively as? 
In Y; = In B; + f2 ln X; +u; (6.5.2) 


where In = natural log (i.e., log to the base e, and where e = 2.718) 
If we write Eq. (6.5.2) as 


InY; =a + bln X; +u; (6.5.3) 


where a = In £}, this model is linear in the parameters œ and {), linear in the logarithms of the variables Y 
and X, and can be estimated by OLS regression. Because of this linearity, such models are called log-log, 
double-log, or log-linear models. See Appendix 6A.3 for the properties of logarithms. 

If the assumptions of the classical linear regression model are fulfilled, the parameters of Eq. (6.5.3) can 
be estimated by the OLS method by letting 


Y* =a + PoX* +u; n _ (6.5.4) 


where Y* = In Y; and X* = In X;. The OLS estimators & and Bo obtained will be best linear unbiased 


estimators of a and B,, respectively. 

One attractive feature of the log-log model, which has made it popular in applied work, is that the slope 
coefficient 8, measures the elasticity of Y with respect to X, that is, the percentage change in Y for a given 
(small) percentage change in X.'° Thus, if Y represents the quantity of a commodity demanded and X its 
unit price, B, measures the price elasticity of demand, a parameter of considerable economic interest. If the 


8Note these properties of the logarithms: (1) In (AB) = In A +In B, (2) In (A/B) = In A — In B, and (3) In (A) =k In A, assuming 
that A and B are positive, and where k is some constant. 

In practice one may use common logarithms, that is, log to the base 10. The relationship between the natural log and 
common log is: In, X = 2.3026 log; X. By convention, In means natural logarithm, and log means logarithm to the base 
10; hence there is no need to write the subscripts e and 10 explicitly. 

‘The elasticity coefficient, in calculus notation, is defined as (dY / Y)(dX / X) = [(dY / dX)(X / Y)]. Readers familiar with dif- 
ferential calculus will readily see that 8, is in fact the elasticity coefficient. 

A technical note: The calculus-minded reader will note that d(In X)/dX = 1 / X or d(In X) = dX / X, that is, for infinitesimally 
small changes (note the differential operator d) the change in In X is equal to the relative or proportional change in X. In 
practice, though, if the change in X is small, this relationship can be written as: change in In X = relative change in X, where 
= means approximately. Thus, for small changes, 


(In X¢ — In X¢_-1) = (Xe — Xe~-1)/ Xt-1 = relative change in X 


Incidentally, the reader should note these terms, which will occur frequently: (1) absolute change, (2) relative or 
proportional change, and (3) percentage change, or percent growth rate. Thus, (X, — X,_;) represents absolute 
change, (X;— X;4)/X;_1 = (X,/X;_1 -1) is relative or proportional change, and [(X, - X,_, )/ X;.1 ]100 is the percentage change, 
or the growth rate. X, and X,_, are, respectively, the current and previous values of the variable X. 
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relationship between quantity demanded and price is as shown in Figure 6.3a, the double-log transformation 
as shown in Figure 6.36 will then give the estimate of the price elasticity (—5). 

Two special features of the log-linear model may be noted: The model assumes that the elasticity coefficient 
between Y and X, B», remains constant throughout (why?), hence the alternative name constant elasticity 
model.!! In other words, as Figure 6.3b shows, the change in In Y per unit change in In X (i.e., the elasticity, 
B2) remains the same no matter at which In X we measure the elasticity. Another feature of the model] is that 
although & and b2 are unbiased estimates of a and B,, B, (the parameter entering the original model) when 
estimated as B, = antilog (å) is itself a biased estimator. In most practical problems, however, the intercept 
term is of secondary importance, and one need not worry about obtaining its unbiased estimate. !” 

In the two-variable model, the simplest way to decide whether the log-linear model fits the data is to plot 


the scattergram of In Y, against In X; and see if the scatter points lie approximately on a straight line, as in 
Figure 6.3b. 


Ys In Y 
5 
= 
E g 
E E 
5 = InY= In B,- B21n X; 
© > 
2 a 
z = 
fe) 3 
Ô 
= 
: In X 
Price | Log of price 
(a) (b) 


Figure 6.3 Constant elasticity model. 


A cautionary note: The reader should be aware of the distinction between a percent change and a percentage 
point change. For example, the unemployment rate is often expressed in percent form, say, the unemployment 
rate of 6%. If this rate goes to 8%, we say that the percentage point change in the unemployment rate is 2, 
whereas the percent change in the unemployment rate is (8 — 6)/6, or about 33%. So be careful when you deal 
with percent and percentage point changes, for the two are very different concepts. 


Example 6.3: Expenditure on Durable Goods in Relation to Total Private Final Consumption 
Expenditure 
Table 6.3 presents data on total private final consumption expenditure (PFCE) in domestic market, expen- 


diture on durable goods (ExpDur), expenditure on nondurable goods (ExpNonDur), and expenditure on 
services (ExpServ), all measured in rupee crore for 1984-85 to 2005-06, at 1999-2000 prices'?. 


11A constant elasticity model will give a constant total revenue change for a given percentage change in price regardless 
of the absolute level of price. Readers should contrast this result with the elasticity conditions implied by a simple linear 
demand function, Y; = B, + B2 X; + u; However, a simple linear function gives a constant quantity change per unit change 
in price. Contrast this with what the log-linear model implies for a given dollar change in price. 

12Concerning the nature of the bias and what can be done about it, see Arthur S. Goldberger, Topics in Regression Analysis, 
Macmillan, New York, 1978, p. 120. 

13Nondurables include food, beverage and tobacco; durable goods include clothing, footwear, furniture, furnishing, appli- 
ances and services; and services include medical care, health care, transport, communication, recreation, education, and 
cultural services. 
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Suppose we wish to find the elasticity of expenditure on durable goods with respect to total private final 
consumption expenditure. Plotting the log of expenditure on durable goods against the log of total private 
final consumption expenditure, you will see that the relationship between the two variables is linear. Hence, 
the double-log model may be appropriate. The regression results are as follows: 


ExpDur, = -2.409 + 0.999 In PFCE, 
se = (0.414) (0.030) (6.5.5) 
t= (-5.813)* (33.339)* 7 =0.982 


where * indicates that the P value is extremely small. 


Table 6.3 Total Private Final Consumption Expenditure and Categories (in Rs. Crore at 1999-2000 prices) for the 
period 1984—85 to 2005—06 


De I I Immm 


Year ExpNonDur ExpDur ExpServ PFCE 
1984-85 389306 56532 85123 651495 
1985-86 402672 61210 89507 680420 
1986-87 403558 63694 99668 698991 
1987-88 411811 65683 106923 721499 
1988-89 441161 70338 115102 769480 
1989-90 457060 72856 124161 804428 
1990-91 478056 76752 133395 843556 
1991-92 486943 73143 140117 860869 
1992-93 488447 74744 147772 879689 
1993-94 512281 81260 155897 918944 
1994-95 529468 81057 172656 961346 
1995-96 552288 87475 190179 1018423 
1996-97 599003 92736 208281 1097390 
1997-98 588457 99721 223813 1122195 
1998-99 634428 96071 -239920 1190267 
1999-00 647011 107231 262128 1257541 
2000-01 618316 121302 295235 1292986 ~ 
2001-02 654176 120616 318564 1367758 
2002-03 626269 125855 354057 1397069 
2003-04 657626 127776 ` 399568 1493871 
2004-05 652864 142936 451773 1579255 
2005-06 684665 158738 494747 1689861 


Source: National Accounts Statistics, 2000-2007, CSO, Govt. of India. 


As these results show, the elasticity of ExpDur with respect to PFCE is about 1.0, suggesting that if private 
final consumption expenditure goes up by 1 percent, on average, the expenditure on durable goods goes 
up by about 1 percent. Thus, expenditure on durable goods is very responsive to changes in private final 
consumption expenditure. This is one reason why producers of durable goods keep a keen eye on changes in 
private final consumption expenditure. In Exercise 6.18, the reader is asked to carry out a similar exercise for 
nondurable goods expenditure. 
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6.6 Semilog Models: Log-Lin and Lin-Log Models 


How to Measure the Growth Rate: The Log-Lin Model 


Economists, businesspeople, and governments are often interested in finding out the rate of growth of certain 
economic variables, such as population, GNP, money supply, employment, productivity, and trade deficit. 

Suppose we want to find out the growth rate of private final consumption expenditure on services for the 
data given in Table 6.3. Let Y, denote real expenditure on services at time f and Y, the initial value of the 
expenditure on services (i.e., the value at 1983-84). You may recall the following well-known compound 
interest formula from your introductory course in economics. 


Y, = Yo(1 +r) (6.6.1) 


where r is the compound (i.e., over time) rate of growth of Y. Taking the natural logarithm of Equation 6.6.1, 
we Can write 


In Y, = In Yo + tIn(1 +r) l (6.6.2) 
Now letting 
B; =n Yo (6.6.3) 
b2 = In(1 +r) : (6.6.4) 
we can write Equation 6.6.2 as 
In ¥; = By + Bot (6.6.5) 
Adding the disturbance term to Equation 6.6.5, we obtain!* 
In Y, = B, + Bot + u; (6.6.6) 


This model is like any other linear regression model in that the parameters 8, and £, are linear. The only 
difference is that the regressand is the logarithm of Y and the regressor is “time,” which will take values of 
1,2. 3, ete: 

Models like Eq. (6.6.6) are called semiog models because only one variable (in this case the regressand) 
appears in the logarithmic form. For descriptive purposes a model in which the regressand is logarithmic 
will be called a log-lin model. Later we will consider a model in which the regressand is linear but the 
regressor(s) is logarithmic and call it a lin-log model. 

Before we present the regression results, let us examine the properties of model (6.6.5). In this model the 
slope coefficient measures the constant proportional or relative change in Y for a given absolute change in 
the value of the regressor (in this case the variable t), that ise 


relative change in regressand 


PE (6.6.7) 


absolute change in regressor 


l4We add the error term because the compound interest formula will not hold exactly. Why we add the error after the 
logarithmic transformation is explained in Sec. 6.8. 
Using differential calculus one can show that 8, = d(In Y)/dX = (1/Y)(d¥/dX) = (dY/Y)/dX, which is nothing but Eq. (6.6.7). 
For small changes in Y and X this relation may be approximated by 
Ca = e 
(Xt — Xt-1) 


Note: Here, X= t. 
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If we multiply the relative change in Y by 100, Equation 6.6.7 will then give the percentage change, or 
the growth rate, in Y for an absolute change in X, the regressor. That is, 100 times B, gives the growth rate in 
Y; 100 times 8, is known in the literature as the semielasticity of Y with respect to X. (Question: To get the 
elasticity, what will we have to do?)!® 


Example 6.4 The Rate of Growth Expenditure on Services 


To illustrate the growth model (6.6.6), consider the data on expenditure on services given in Table 6.3. The 
regression results over time (t) are‘as follows: 


ExpDur, = 11.216 + 0.082 t 
se = (0.022) (0.002) P=0.992 (6.6.8) 
t = (508.608)* (48.739)* 


Note: ExServ stands for expenditure on services and * denotes that the P value is extremely small. 

The interpretation of Eq. (6.6.8) is that over the period 1984-85 to 2005-06, expenditure on services 
increased at the (yearly) rate of 8.2 percent. Since 11.216 = log of ExServ at the beginning of the study period, 
by taking antilog we obtain 74309.94 (crore rupees) as the ExpServ at the end of 1983-84. The regression line 
obtained in Eq. (6.6.8) is sketched in Figure 6.4. 
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Log of expenditure on services 


Figure 6.4 


Instantaneous versus Compound Rate of Growth 


The coefficient of the trend variable in the growth model (6.6.6), B», gives the instantaneous (at a point in 
time) rate of growth and not the compound (over a period of time) rate of growth. But the latter can be easily 
found from Eq. (6.6.4) by taking the antilog of the estimated B, and subtracting 1 from it and multiplying 
the difference by 100. Thus, for our illustrative example, the estimated slope coefficient is 0.082. Therefore, 
[antilog(0.082) — 1] = 0.085 or 8.5 percent. Thus, in the illustrative example, the compound rate of growth 
on expenditure on services was about 8.5 percent per quarter, which is slightly higher than the instantaneous 
growth rate of 8.2 percent. This is of course due to the compounding effect. 


'6See Appendix 6A.4 for various growth formulas. 
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Linear Trend Model 


Instead of estimating model (6.6.6), researchers sometimes estimate the following model: 


Y, = Bi + Bot + uy (6.6.9) 
That is, instead of regressing the log of Y on time, they regress Y on time, where Y is the regressand under 
consideration. Such a model is called a linear trend model and the time variable t is known as the trend 
variable. If the slope coefficient in Equation 6.6.9 is positive, there is an upward trend in Y, whereas if it is 
negative, there is a downward trend in Y. 

For the expenditure on services data that we considered earlier, the results of fitting the linear trend model 
(6.6.9) are as follows: 

ExServ, = 15783.87 + 17633.76 t 
t= (0.91) (13.38) r? = 0.8996 (6.6.10) 

In contrast to Eq. (6.6.8), the interpretation of Eq. (6.6.10) is as follows: Over the period 1984-85 to 
2005—06, on average, expenditure on services increased at the absolute (note: not relative) rate of about 
-17633.76 crore rupees per year. That is, there was an upward trend in the expenditure on services. 

The choice between the growth rate model (6.6.8) and the linear trend model (6.6.10) will depend upon 
whether one is interested in the relative or absolute change in the expenditure on services, although for 
comparative purposes it is the relative change that is generally more relevant. In passing, observe that we 
cannot compare the r° values of models (6.6.8) and (6.6.10) because the regressands in the two models are 
different. We will show in Chapter 7 how one compares the r’s of models like (6.6.8) and (6.6.10). 


The Lin-Log Model 


Unlike the growth model just discussed, in which we were interested in finding the percent growth in Y for 
an absolute change in X, suppose we now want to find the absolute change in Y for a percent change in X. A 
model that can accomplish this purpose can be written as: 

Y; = bi + fo In X; Fur (6.6.11) 


For descriptive purposes we call such a model a lin-log model. 
Let us interpret the slope coefficient eee As usual, 


Change in Y 
S Change in In X 
Change in Y 
~ relative change in X 


Bo 


The second step follows from the fact that a change in the log of a number is a relative change. 


17 Again, using differential calculus, we have 


Therefore, 


dY 
X 
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Symbolically, we have 
AY 


n 6.6.12) 
f2 AX/X ( 
where, as usual, A denotes a small change. Equation 6.6.12 can be written, equivalently, as 
AY = Bo(AX/X) (6.6.13) 


This equation states that the absolute change in Y ( = AY) is equal to slope times the relative change in X. If 
the latter is multiplied by 100, then Eq. (6.6.13) gives the absolute change in Y for a percentage change in X. 
Thus, if (AX/X) changes by 0.01 unit (or 1 percent), the absolute change in Y is 0.01(8,); if in an application 
one finds that 8, = 500, the absolute change in Y is (0.01)(500) = 5.0. Therefore, when regression (6.6.11) 
is estimated by OLS, do not forget to multiply the value of the estimated slope coefficient by 0.01, or, what 
amounts to the same thing, divide it by 100. Zf you do not keep this in mind, your interpretation in an appli- 
cation will be highly misleading. 

The practical question is: When is a lin—log model like Eq. (6.6.11) useful? An interesting application has 
been found in the so-called Engel expenditure models, named after the German statistician Ernst Engel, 
1821-1896. (See Exercise 6.10.) Engel postulated that “the total expenditure that is devoted to food tends to 
increase in arithmetic progression as total expenditure increases in geometric progression.”"® 


Example 6.5 


As an illustration of the lin-log model, let us revisit our example on food expenditure in India, Example 3.2. 

There we fitted a linear-in-variables model as a first approximation. But if we plot the data we obtain the plot 

in Figure 6.5. As this figure suggests, food expenditure increases more slowly as total expenditure increases, 

perhaps giving credence to Engel’s law. The results of fitting the lin-log model to the data are as follows: 
FoodExpi = —1283.912 + 257.2700 In TotalExp; 


t= (—4.3848)* (5.6625)* r? = 0.3769 (6.6.14) 


Note: * denotes an extremely small p value. 


Interpreted in the manner described earlier, the slope coefficient 700 ” 
of about 257 means that an increase in the total food expenditure 3 
of 1 percent, on average, leads to about 2.57 rupees increase inthe = oan : 
expenditure on food of the 55 families included in the sample. (Note: É 500 AS 
We have divided the estimated slope coefficient by 100.) pe cates. 

Before proceeding further, note that if you want to compute the & 400 x : ooo 
elasticity coefficient for the log-lin or lin-log models, you càn do so $ 009% Y es al 
from the definition of the elasticity coefficient given before, namely, 3 ane S Pn 

He NN A A & 200 R 2 
Elasticity = XY a 

As a matter of fact, once the functional form of a model is known, E 400 500 600 700 800 900 

one can compute elasticities by applying the preceding definition. Total expenditure (Rs.) 


(Table 6.6, given later, summarizes the elasticity coefficients for the é 
various models.) Figure 6.5 


aaa 


'8See Chandan Mukherjee, Howard White, and Marc Wuyts, Econometrics and Data Analysis for Developing Countries, Rout- 


ledge, London, 1998, p. 158. This quote is attributed to H. Working, “Statistical Laws of Family Expenditure,” Journal of the 
American Statistical Association, vol. 38, 1943, pp. 43-56. 
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It may be noted that sometimes logarithmic transformation is used to reduce heteroscedasticity as well 
as skewness. (See Chapter 11.) A common feature of many economic variables, is that they are positively 
skewed (e.g., size distribution of firms or distribution of income or wealth) and they are heteroscedastic. A 
logarithmic transformation of such variables reduces both skewness and heteroscedasticity. That is why labor 
economists often use the logarithms of wages in the regression of wages on, say, schooling, as measured by 
years of education. 


6.7 Reciprocal Models 


Models of the following type are known as reciprocal models. 
1 
Y; = Bi + po (=) + Uj (6.7.1) 


Although this model is nonlinear in the variable X because it enters inversely or reciprocally, the model is 
linear in B, and £, and is therefore a linear regression model.’ 

This model has these features: As X increases indefinitely, the term B,(I/X) approaches zero (note: Bois a 
constant) and Y approaches the limiting or asymptotic value B,. 

Therefore, models like (6.7.1) have built in them an asymptote or limit value that the dependent variable 
will take when the value of the X variable increases indefinitely.” Some likely shapes of the curve corre- 
sponding to Eq. (6.7.1) are shown in Figure 6.6. 


Y Y Y 
f2 z0 b2 >0 By <0 
ßı>0 B,<0 By 
pı 
DG X 
0 0 X 0 -p 
Bı 
-ßı 
(a) (b) (c) 


Figure 6.6 The reciprocal model: Y = $; + po (5): 


Example 6.6 


As an illustration of Figure 6.6a, consider the data given in Table 6.4. These are cross-sectional data for 64 
countries on child mortality and a few other variables. For now, concentrate on the variables child mortality 
(CM) and per capita GNP, which are plotted in Figure 6.7. 


"lf we let X* = (1/X;), then Eq. (6.7.1) is linear in the parameters as well as the variables Y; and X*. 


20The slope of Eq. (6.7.1) is: dY/dX = -B,(1 / X*), implying that if 8, is positive, the slope is negative throughout, and if 8, 
is negative, the slope is positive throughout. See Figures 6.6a and 6.6c, respectively. 
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Table 6.4 Fertility and Other Data for 64 Countries 


Observation CM FLFP PGNP TFR Observation CM FLFP PGNP TFR 


1 128 37 1870 6.66 B5 142 50 8640 7.17 
204 22 130 6.15 34 104 62 350 6.60 
3 202 16 310 7.00 35 287 31 230 7.00 
4 197 65 570 6.25 36 41 66 1620 `3.91 
5 96 76 2050 3.81 a7 312 11 190 6.70 
6 209 26 200 6.44 38 77 88 2090 4.20 
7 170 45 670 6.19 39 142 22 900 5.43 
8 240 29 300 5.89 40 262 22 230 6.50 
9 241 11 120 5.89 41 215512 140 6.25 
10 55 53 290 2.36 42 246 9 330 7.10 
11 75 87 1180 3.93 43 191 31 1010 7.10 
12 129 55 900 5.99 44 18229 300 7.00 
13 24 93 1730 3.50 45 37 88 1730 3.46 
14 16S 31 150" 71 46 103 35 780 5.66 
15 94 77 1160 4.21 47 67 85 1300 4.82 
16 96 80 1270 5.00 48 143 78 930 5.00 
17 148 30 580 5.27 49 83 85 690 4.74 
18 98 69 660 5.21 50 223 33 200 8.49 
ije 161 43 420 6.50 Sil 240 19 450 6.50 
20 118 47 1080 6.12 52 312° 21 280 6.50 
21 269 17 290 6.19 53 12 79 4430 1.69 
22 189 35 270 5.05 54 52. 83 270 3.25 
23 126 58 560 6.16 55 79 43 1340 7.17 
24 12 81 4240 1.80 56 61 88 670 3.52 
25 16729 240 4.75 57 168 28 410 6.09 
26 I5 GS 430 4.10 58 28 95 4370 2.86 
27 107 87 3020 6.66 59 121 41 1310 4.88 
28 72 63 1420 7.28 60 115 62 1470 3.89 
29 128 49 420 8.12 61 186 45 300 6.90 
30 27 63° 19880. 5.23 62 47 85 3630 4.10 ~ 
31 152 84 420 5.79 63 178 45 220 6.09 
32 224 23 530 6.50 64 142 67 560 7.20 


Note: CM = Child mortality, the number of deaths of children under age 5 in a year per 1000 live births. 
FLFP = Female literacy rate, percent. 
PGNP = per capita GNP in 1980. 
TFR = total fertility rate, 1980-1985, the average number of children born to a woman, using age-specific fertility 
rates for a given year. 


Source: Chandan Mukherjee, Howard White, and Marc Whyte, Econometrics and Data Analysis for Developing Countries, Routledge, 
London, 1998, p. 456. 


As you can see, this figure resembles Figure 6.6a: As per capita GNP increases, one would expect child 
mortality to decrease because people can afford to spend more on health care, assuming all other factors 
remain constant. But the relationship is not a straight line one: As per capita GNP increases, initially there is a 
dramatic drop in CM but the drop tapers off as per capita GNP continues to increase. 
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if we try to fit the reciprocal model (6.7.1), we obtain the Child Mortality and PGNP 
following regression results: 400 


CM; = 81.79436 + 27,237.1 (sane) 


Boy) (6.7.2) 
se = (10.8321) (3759.999) 300 
b= (SSID : (7.2535) 1=0.4590 
= 200 


As per capita GNP increases indefinitely, child mortality 
approaches its asymptotic value of about 82 deaths per thousand. 
As explained in footnote 20, the positive value of the coefficient 
of (1 /PGNP,) implies that the rate of change of CM with respect 
to PGNP is negative. 


One of the important applications of Figure 6.6b is the 0 
celebrated Phillips curve of macroeconomics. Using the 0 5000000015000 20000 
data on percent rate of change of money wages (Y) and the PGNP 
unemployment rate (X) for the United Kingdom for the period Figure 6.7 Relationship between child 
1861-1957, Phillips obtained a curve whose general shape mortality and per capita GNP in 
resembles Figure 6.6b (Figure 6.8).7' i 66 countries. 


The natural rate of unemployment 


Unemployment rate, % 


Rate of change of money wages, % 


-B, ==- OS SE OO 


Figure 6.8 The Phillips curve. 


As Figure 6.8 shows, there is an asymmetry in the response of wage changes to the level of the unemployment 
rate: Wages rise faster for a unit change in unemployment if the unemployment rate is below UN, which is 
called the natural rate of unemployment by economists (defined as the rate of unemployment required to keep 
[wage] inflation constant), and then they fall slowly for an equivalent change when the unemployment rate 
is above the natural rate, U", indicating the asymptotic floor, or —B,, for wage change. This particular feature 
of the Phillips curve may be due to institutional factors, such as union bargaining power, minimum wages, 
unemployment compensation, etc. 


21A. W. Phillips, “The Relationship between Unemployment and the Rate of Change of Money Wages in the United 
Kingdom, 1861-1957,” Economica, November 1958, vol. 15, pp. 283-299. Note that the original curve did not cross the 
unemployment rate axis, but Fig. 6.8 represents a later version of the curve. 
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Since the publication of Phillips’s article, there has been very extensive research on the Phillips curve at 
the theoretical as well as empirical levels. Space does not permit us to go into the details of the controversy 
surrounding the Phillips curve. The Phillips curve itself has gone through several incarnations. A comparatively 
recent formulation is provided by Olivier Blanchard.?? If we let m, denote the inflation rate at time t, which is 
defined as the percentage change in the price level as measured by a representative price index, such as the 
Consumer Price Index (CPI), and UN, denote the unemployment rate at time t, then a modern version of the 
Phillips curve can be expressed in the following format: ` 


ne — nE = Bo(UN; — UN) + ut (6.7.3) 


where 7; = actual inflation rate at time t 
xf = expected inflation rate at time t, the expectation being formed in year (t — 1) 
UN, = actual unemployment rate prevailing at time t 
U" = natural rate of unemployment 
u, = stochastic error term” 
Since 7f is not directly observable, as a starting point one can make the simplifying assumption that xf = 7-1: 
that is, the inflation rate expected this year is the inflation rate that prevailed in the last year; of course, 
more complicated assumptions about expectations formation can be made, and we will discuss this topic in 
Chapter 17, on distributed lag models. 
Substituting this assumption into Eq. (6.7.3) and writing the regression model in the standard form, we 
obtain the following estimating equation: 


Nt — Ne- = Bi + B2UN: + Ut (6.7.4) 


where B; = —8,U™. Equation 6.7.4 states that the change in the inflation rate between two time periods is 
linearly related to the current unemployment rate. A priori, B is expected to be negative (why?) and £; is 
expected to be positive (this figures, since 8, is negative and U’ is positive). 

Incidentally, the Phillips relationship given in Eq. (6.7.3) is known in the literature as the modified 
Phillips curve, or the expectations-augmented Phillips curve (to indicate that 7,_, stands for expected 
inflation), or the accelerationist Phillips curve (to suggest that a low unemployment rate leads to an 
increase in the inflation rate and hence an acceleration of the price level). 


Example 6.7 


As an illustration of the modified Phillips curve, we present in Table 6.5 data on inflation as measured by 
year-to-year percentage in the Consumer Price Index (CPiflation) and the unemployment rate for the period 
1960-2006. The unemployment rate represents the civilian unemployment rate. From these data we obtained 
the change in the inflation rate (7, — 7,_,) and plotted it against the civilian unemployment rate; we are using 
the CPI as a measure of inflation. The resulting graph appears in Figure 6.9. 

As expected, the relation between the change in inflation rate and the unemployment rate is negative—a 
low unemployment rate leads to an increase in the inflation rate and therefore an acceleration of the price 
level, hence the name accelerationist Phillips curve. 

Looking at Figure 6.9, it is not obvious whether a linear (straight line) regression model or a reciprocal 
model fits the data; there may be a curvilinear relationship between the two variables. We present below 
regressions based on both the models. However, keep in mind that for the reciprocal model the intercept term 
is expected to be negative and the slope positive, as noted in footnote 20. 


Ürn — m1) = 3.7844 — 0.6385 UN, 
t=(4.1912) (242756)  1r2=0.2935 


Linear model: 


(6.7.5) 


2See Olivier Blanchard, Macroeconomics, Prentice Hall, Englewood Cliffs, NJ, 1997, Chap. 17. 


23Economists believe this error term represents some kind of supply shock, such as the OPEC oil embargoes of 1973 and 
1979; 
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Reciprocal model: 


a 1 
mien) = 3.068417 20771 —— 
Cees) si (an. (6.7.6) 


t = (—3.1635) (3.2886) r? = 0.1973 


All the estimated coefficients in both the models are individually statistically significant, all the p values being 
lower than the 0.005 level. 


Table 6.5 Inflation Rate and Unemployment Rate, United States, 1960-2006 (For all urban consumers; 
1982-1984 = 100, except as noted) 


Year INFLRATE UNRATE Year INFLRATE l UNRATE 


1960 1.718 5.5 1984 4.317 75 
1961 1.014 6.7 1985 3.561 72 
1962 1.003 535 1986 1.859 7.0 
1963 1.325 oF 1987 3.650 6.2 
1964 1.307 52 1988 4.137 5.5 
1965 1.613 4.5 1989 — 4.818 — 53 
1966 2.857 3.8 1990 5.403 5.6 
1967 3.086 i 3.8 1991 4.208 6.8 
1968 4.192 3.6 1992 3.010 75 
1969 5.460 35 19953 2.994 6.9 
1970 5722 4.9 1994 2.561 6.1 
1971 4.381 a2 1995 2.834 5.6 
1972 3.210 So 1996 2.953 5.4 
1973 6.220 4.9 l 1997 2.294 4.9 
1974 11.036 5.6 1998 1558 4.5 
1975 9.128 8.5 1999 2.209 4.2 
1976 5.762 rd 2000 3.361 4.0 
1977 6.503 Zal 2001 2.846 4.7 
1978 7.591 6.1 2002 1.581 5.8 
1979 11.350 58 2003 2.279 . 6.0 
1980 13.499 ia 2004 2.663 55 
1981 10.316 7.6 2005 3.388 5.1 
1982 6.161 a7 2006 3.226 4.6 
1983 3212 9.6 


Note: The inflation rate is the percent year-to-year change in CPI. The unemployment rate is the civilian unemployment rate. 


Source: Economic Report of the President, 2007, Table B-60, p. 399, for CPI changes and Table B-42. p. 376, for the unemployment rate. 


Model (6.7.5) shows that if the unemployment rate goes down by 1 percentage point, on average, the 
change in the inflation rate goes up by about 0.64 percentage points, and vice versa. Model (6.7.6) shows 
that even if the unemployment rate increases indefinitely, the most the change in the inflation rate will go 
down will be about 3.07 percentage points. Incidentally, from Eq. (6.7.5), we can compute the underlying 
natural rate of unemployment as: 


fi _ 3.7844 


= HI = 5,9270 6.7.7 
= ees” 2D 


uN 


That is, the natural rate of unemployment is about 5.93%. Economists put the natural rate between 5 and 
6%, although in the recent past in the United States the actual rate has been much below this rate. 
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Change in inflation rate 


Unemployment rate (%) 


Figure 6.9 The modified Phillips curve. 


Log Hyperbola or Logarithmic Reciprocal Model 


We conclude our discussion of reciprocal models by considering the logarithmic reciprocal model, which 
takes the following form: 
1 
In Y; = Bi — f2 (=) + Uj (6.7.8) 


I 


Its shape is as depicted in Figure 6.10. As this figure shows, initially Y 

Y increases at an increasing rate (i.e., the curve is initially convex) 

and then it increases at a decreasing rate (i.e., the curve becomes 

concave).”* Such a model may therefore be appropriate to model 

a short-run production function. Recall from microeconomics that 

if labor and capital are the inputs in a production function and if 

we keep the capital input constant but increase the labor input, the 

short-run output—labor relationship will resemble Figure 6.10. (See 

Example 7.3, Chapter 7.) os ' 
Figure 6.10 The log reciprocal model. 


6.8 Choice of Functional Form 


In this chapter we discussed several functional forms an empirical model can assume, even within the 
confines of the linear-in-parameter regression models. The choice of a particular functional form may be 


24From calculus, it can be shown that 


But 
d Tay 
ax "= K 
Making this substitution, we obtain 
dye 
ax 2x2 


which is the slope of Y with respect to X. 
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comparatively easy in the two-variable case, because we can plot the variables and get some rough idea about 
the appropriate model. The choice becomes much harder when we consider the multiple regression model 
involving more than one regressor, as we will discover when we discuss this topic in the next two chapters. 
There is no denying that a great deal of skill and experience are required in choosing an appropriate model 
for empirical estimation. But some guidelines can be offered: 

1. The underlying theory (e.g., the Phillips curve) may suggest a particular functional form. 

2. It is good practice to find out the rate of change (i.e., the slope) of the regressand with respect to the 
regressor as well as to find out the elasticity of the regressand with respect to the regressor. For the various 
models considered in this chapter, we provide the necessary formulas for the slope and elasticity coefficients 
of the various models in Table 6.6. The knowledge of these formulas will help us to compare the various 
models. 

3. The coefficients of the model chosen should satisfy certain a priori expectations. For example, if we are 
considering the demand for automobiles as a function of price and some other variables, we should expect a 
negative coefficient for the price variable. 

4. Sometimes more than one model may fit a given set of data reasonably well. In the modified Phillips 
curve, we fitted both a linear and a reciprocal model to the same data. In both cases the coefficients were in 
line with prior expectations and they were all statistically significant. One major difference was that the 7? 
value of the linear model was larger than that of the reciprocal model. One may therefore give a slight edge 
to the linear model over the reciprocal model. But make sure that in comparing two r values the dependent 
variable, or the regressand, of the two models is the same; the regressor(s) can take any form. We will explain 
the reason for this in the next chapter. 

5. In general one should not overemphasize the r° measure in the sense that the higher the 7’ the better the 
model. As we will discuss in the next chapter, 7? increases as we add more regressors to the model. What is of 
greater importance is the theoretical underpinning of the chosen model, the signs of the estimated coefficients 
and their statistical significance. If a model is good on these criteria, a model with a lower 7° may be quite 
acceptable. We will revisit this important topic in greater depth in Chapter 13. 


Table 6.6 
Model Equation Slope (= ix) Elasticity (= ay) 
Linear Y=pr px 2 b2 (2) 
Log-linear InY = B, + zin X p2( 7) Bo 
Log-in InY= fı + 2X p2 (Y) f2 (X)* 
Lin-log Y= fp, + Bain X B2 i) e(z) 
a rena) a G 
eciproca = Bi + B2 x 2 xz 2 XY 

' 1 Y x) 

Log reciprocal InY= Bp, - e(z) pal 32) pal X 


Note: * indicates that the elasticity is variable, depending on the value taken by X or Y or both. When no X and Y values are 
specified, in practice, very often these elasticities are measured at the mean values of these variables, namely, X and Y. 
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6. In some situations it may not be easy to settle on a particular functional form, in which case we may 
use the so-called Box-Cox transformations. Since this topic is rather technical, we discuss the Box-Cox 
procedure in Appendix 6A.5. 


*6.9 A Note on the Nature of the Stochastic Error Term: Additive 
versus Multiplicative Stochastic Error Term l 


‘Consider the following regression model, which is the same as Eq. (6.5.1) but without the error term: 


Y; = BX” (6.9.1) 

For estimation purposes, we can express this mode] in three different forms: 
Y; = Bi XP u; i (6.9.2) 
Y, = pi XP e" (6.9.3) 
Y; = BX? + u; (6.9.4) 

Taking the logarithms on both sides of these equations, we obtain 

In Y; = æ + By) In X; + Inu; (6.9.2a) 
In Y; =a + Bon X; + u; (6.9.3a) 
nY; =n (pX +u) (6.9.4a) 


where «a = In ß,. 

Models like Eq. (6.9.2) are intrinsically linear (in-parameter) regression models in the sense that by 
suitable (log) transformation the models can be made linear in the parameters œ and f,. (Note: These models 
are nonlinear in 84.) But model (6.9.4) is intrinsically nonlinear-in-parameter. There is no simple way to take 
the log of Eq. (6.9.4) because In (A + B) # ln A + In B. 

Although Eqs. (6.9.2) and (6.9.3) are linear regression models and can be estimated by ordinary least 
squares (OLS) or maximum likelihood (ML), we have to be careful about the properties of the stochastic error 
term that enters these models. Remember that the BLUE property of OLS (best linear unbiased estimator) 
requires that u; has zero mean value, constant variance, and zero autocorrelation. For hypothesis testing, we 
further assume that u; follows the normal distribution with mean and variance values just discussed. In short. 
we have assumed that u; ~ N(O, o°). 

Now consider model (6.9.2). Its statistical counterpart is given in (6.9.2a). To use the classical normal 
linear regression model (CNLRM), we have to assume that 


Inu; ~ N(0, 07) (6.9.5) 

Therefore, when we run the regression (6.9.2a), we will have to apply the normality tests discussed in Chapter 

5 to the residuals obtained from this regression. Incidentally, note that if In u; follows the normal distribution 

with zero mean and constant variance, then statistical theory shows that u; in Eq. (6.9.2) must follow the 
log-normal distribution with mean e® /? and variance e” (e° — 1). 

As the preceding analysis shows, one has to pay very careful attention to the error term in transforming 

a model for regression analysis. As for Eq. (6.9.4), this model is a nonlinear-in-parameter regression model 


and will have to be solved by some iterative computer routine. Model (6.9.3) should not pose any problems 
for estimation. 


*Optional 
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To sum up, pay very careful attention to the disturbance term when you transform a model for regression 
analysis. Otherwise, a blind application of OLS to the transformed model will not produce a model with 
desirable statistical properties. 


Summary and Conclusions 


This chapter introduced several of the finer points of the classical linear regression model (CLRM). 


l. 


Sometimes a regression model may not contain an explicit intercept term. Such models are known as 
regression through the origin. Although the algebra of estimating such models is simple, one should 
use such models with caution. In such models the sum of the residuals ` i; is nonzero; additionally, 
the conventionally computed 7 may not be meaningful. Unless there is a strong theoretical reason, it is 
better to introduce the intercept in the model explicitly. 

The units and scale in which the regressand and the regressor(s) are expressed are very important 
because the interpretation of regression coefficients critically depends on them. In empirical research 
the researcher should not only quote the sources of data but also state explicitly how the variables are 
measured. 


. Just as important is the functional form of the relationship between the regressand and the regressor(s). 


Some of the important functional forms discussed in this chapter are (a) the log—linear or constant 
elasticity model, (b) semilog regression models, and (c) reciprocal models. 

In the log-linear model both the regressand and the regressor(s) are expressed in the logarithmic 
form. The regression coefficient attached to the log of a regressor is interpreted as the elasticity of the 
regressand with respect to the regressor. 


. Inthe semilog model either the regressand or the regressor(s) are in the log form. In the semilog model 


where the regressand is logarithmic and the regressor X is time, the estimated slope coefficient (multi- 
plied by 100) measures the (instantaneous) rate of growth of the regressand. Such models are often 
used to measure the growth rate of many economic phenomena. In the semilog model if the regressor 
is logarithmic, its coefficient measures the absolute rate of change in the regressand for a given percent 
change in the value of the regressor. 

In the reciprocal models, either the regressand or the regressor is expressed in reciprocal, or inverse, 
form to capture nonlinear relationships between economic variables, as in the celebrated Phillips curve. 
In choosing the various functional forms, great attention should be paid to the stochastic disturbance 
term u;. As noted in Chapter 5, the CLRM explicitly assumes that the disturbance term has zero mean 
value and constant (homoscedastic) variance and that it is uncorrelated with the regressor(s). It is under 
these assumptions that the OLS estimators are BLUE. Further, under the CNLRM, the OLS estimators 
are also normally distributed. One should therefore find out if these assumptions hold in the functional 
form chosen for empirical analysis. After the regression is run, the researcher should apply diagnostic 
tests, such as the normality test, discussed in Chapter 5. This point cannot be overemphasized, for the 
classical tests of hypothesis, such as the f, F, and x”, rest on the assumption that the disturbances are 
normally distributed. This is especially critical if the sample size is small. 

Although the discussion so far has been confined to two-variable regression models, the subsequent 
chapters will show that in many cases the extension to multiple regression models simply involves more 
algebra without necessarily introducing more fundamental concepts. That is why it is so very important 
that the reader have a firm grasp of the two-variable regression model. 
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Multiple Choice Questions 


1. For a regression through the origin, the intercept is equal to 
G Il 
Be 
c. 0 
d. —1 
2. For a regression through the origin, state which of these statements is NOT true 
a. The J` ù; need not be equal to zero 
b. The raw r computed satisfies 0<r<1 and is directly comparable to the conventionally computed 
Te 
c. In estimating the model, we use raw sum of squares and cross products 
d. The conventionally computed 7 may give negative value. 
3. If we multiply both Y and X by 1000 and re-estimate the regression, the slope coefficient and its 
standard error will 
a. Increase by 1000 times 
b. Decrease by 1000 times 
c. Remain same 
d. Increase by (1/1000) times 
4. If we multiply both Y and X by 1000 and re-estimate the regression, the intercept coefficient and its 
standard error will 
a. Increase by 1000 times 
b. Decrease by 1000 times 
c. Remain same 
d. Increase by (1/1000) times 
5. If we multiply Y by 1000 and re-estimate the regression, the slope coefficient and its standard error will 
a. Increase by 1000 times 
b. Decrease by 1000 times 
c. Remain same 
d. Increase by (1/1000) times 
6. If we multiply Y by 1000 and re-estimate the regression, the intercept coefficient and its standard error 
will 
a. Increase by 1000 times 
b. Decrease by 1000 times 
c. Remain same 
d. Increase by (1/1000) times 
7. If we multiply X by 1000 and re-estimate the regression, the slope coefficient and its standard error will 
a. Increase by 1000 times 
b. Decrease by 1000 times 
c. Remain same 
d. Increase by (1/1000) times 
8. If we multiply X by 1000 and re-estimate the regression, the intercept coefficient and its standard error 
will 
a. Increase by 1000 times 
b. Decrease by 1000 times 


10. 


ule 


12. 


13: 


14. 


US; 


16. 


We 
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c. Remain same 
d. Increase by (1/1000) times 
If in Y; = B, + B.X; + u; both Y and X are standardized variables, the intercept term will be 
a. Positive 
b. Negative 
c. Between —] and +1 
d. Equal to zero 
In double log regression model, the regression slope gives 
a. The relative change in Y for an absolute change in X 
b. The percentage change in Y for a given percentage change in X 
c. The absolute change in Y for a percent change in X 
d. By how many units Y changes for a unit change in X 
In Log-Lin regression model, the slope coefficient gives 
a. The relative change in Y for an absolute change in X 
b. The percentage change in Y for a given percentage change in X 
c. The absolute change in Y for a percent change in X 
d. By how many units Y changes for a unit change in X 
In Lin-Log regression model, the slope coefficient gives 
a. The relative change in Y for an absolute change in X 
b. The percentage change in Y for a given percentage change in X 
c. The absolute change in Y for a percent change in X 
d. By how many units Y changes for a unit change in X 
In double log model, elasticity of Y with respect to X is given by 
a. By 
b. By (X/Y¥) 
Gps 
d. B,(1/¥) 
In Log-Lin model, elasticity of Y with respect to X is given by 
a. By 
b. B,(X/Y) 
c. BX 
d. B,(1/Y) 
In Lin-Log model, elasticity of Y with respect to X is given by 
a. By 
b. B,(XIY) 
c. BX 
d. B,(1/Y) 
In linear model, elasticity of Y with respect to X is given by 
a. B 
b. B,(X/Y) 
c. BX 
d. B, (1/Y) 
You probably want to avoid log-log specifications if 
a. Itis possible for Y or X to take on zero or negative values 
b. The elasticity of Y with respect to X is one 
c. You have values of Y which are large and values of X which are small 
d. If you get desired results using linear model 
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18. Given the relationship between child mortality (CM) and per capita GNP (PGNP) in Eq. 6.7.2 for 64 
countries, which of the statements given below is True? 
a. Both the intercept and slope coefficients are statistically significantly different from zero. 
b. As per capita GNP increases by $1, child mortality increases by about 82 deaths per thousand. 
c. As per capita GNP increases by 1%, child mortality increases by 0.82%. 
d. The positive value of the coefficient of (1/PGNP) implies that the rate of change of CM with 
respect to PGNP is positive. 
19. The choice of functional form in a regression model depends on 
a. Only the underlying theory 
b. Only on the rate of change of Y with respect to X 
c. Whether the coefficients of the model chosen satisfy certain a priori expectations 
d. All of the above 
20. When comparing 7” of two regression models, the models should have the same 
a. X variables 
b. Y variables 
c. Error term 
d. Beta coefficients 


Exercises 


Questions 


6.1. Consider the regression model 


Yi = Bi + Box; + uj 
where y; = (Y; — Y) and x; = (X; — X). In this case, the regression line must pass through the 
origin. True or false? Show your calculations. 
6.2. The following regression results were based on monthly data over the period January 1978 to 
December 1987: 
f, = 0.00681 + 0.75815X, 
se = (0.02596) (0.27009) 
t= (0.26229) (2.80700) 
p value = (0.7984) (0.0186) r? = 0.4406 
f, = 0.76214x, 
se = (0.265799) 


t = (2.95408) 
p value = (0.0131) r? = 0.43684 
where Y = monthly rate of return on Texaco common stock, %, and X = monthly market rate of return, 


%. 


*The underlying data were obtained from the data diskette included in Ernst R. Berndt, The Practice of Econometrics: Classic 
and Contemporary, Addison-Wesley, Reading, Mass., 1991, 


6.3. 


6.4. 


6.5. 


6.6. 


6.7. 
6.8. 


Gite 
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a. What is the difference between the two regression models? 

b. Given the preceding results, would you retain the intercept term in the first model? Why or why 
not? 

c. How would you interpret the slope coefficients in the two models? 

d. What is the theory underlying the two models? 

e. Can you compare the 7” terms of the two models? Why or why not? 

f The Jarque—Bera normality statistic for the first model in this problem is 1.1167 and for the second 
model it is 1.1170. What conclusions can you draw from these statistics? 

g. The ż value of the slope coefficient in the zero intercept mode] is about 2.95, whereas that with the 
intercept present is about 2.81. Can you rationalize this result? 

Consider the following regression model: 


l 1 
T aati: (-)+u 


I 


Note: Neither Y nor X assumes zero value. 

a. Is this a linear regression model? 

b. How would you estimate this model? 

c. What is the behavior of Y as X tends to infinity? 

d. Can you give an example where such a model may be appropriate? 
Consider the log—linear model: 


In Y; = By + Bo ln X; +u; 
Plot Y on the vertical axis and X on the horizontal axis. Draw the curves showing the relationship 


between Y and X when £, = 1, and when B,> 1, and when £, < 1. 
Consider the following models: 


Model I: Y; = Bi + BoXi + ui 
Model II: Y =o 024, Tu; 


where Y* and X“ are standardized variables. Show that @ = Ê2(Sx/Sp) and hence establish that 
although the regression slope coefficients are independent of the change of origin they are not 
independent of the change of scale. 

Consider the following models: 


In Y* = œ + 2 nX + uF 
In Y; = Bi + Bo In X; + ui 


where Y;* = wı Y; and Xf = w2X;, the w’s being constants. 

a. Establish the relationships between the two sets of regression coefficients and their standard errors. 
b. Is the 7? different between the two models? 

Between regressions (6.6.8) and (6.6.10), which model do you prefer? Why? 

For the regression (6.6.8), test the hypothesis that the slope coefficient is not significantly different 
from 0.005. 

From the Phillips curve given in Eq. (6.7.3), is it possible to estimate the natural rate of unemployment? 
How? 
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6.10. The Engel expenditure curve relates a consumer’s expenditure on a commodity to his or her total 
income. Letting Y = consumption expenditure on a commodity and X = consumer income, consider 
the following models: 


Y; = Bi + BoXi + úi; 

;ı = By + p2(1/X;) + ui 

In Y; = Inf, + fo in X; + ui 

In Y; = In By + Bo(1/X;) + ui 
Y, = Bi + Pon X; + uj 


Which of these model(s) would you choose for the Engel expenditure curve and why? (Hint: Interpret 
the various slope coefficients, find out the expressions for elasticity of expenditure with respect to 
income, etc.) 

6.11. Consider the following model: 


x 


eß1+b2Xi 
LI 1 + ehithX: 


As it stands, is this a linear regression model? If not, what “trick,” if any, can you use to make it a 
linear regression model? How would you interpret the resulting model? Under what circumstances 
might such a model be appropriate? 

6.12. Graph the following models (for ease of exposition, we have omitted the observation subscript, i): 


a. Y = B, xX”, for Bp > 1, Bp = 1,0 < fo < | pega 
b. Y = Bye®*, for Bo > Oand fp < 0. 


Discuss where such models might be appropriate. 
6.13. Consider the following regression: 


SPI, = —17.8 + 33.2 Gini; 
se= (4.9) (11.8) 7?=0.16 


Where SPI = index of sociopolitical instability, average for 1960-1985, and Gini = Gini coefficient for 

1975 or the closest available year within the range of 1970-1980. The sample consist of 40 countries. 

The Gini coefficient is a measure of income inequality and it lies between 0 and 1. The closer it is to 

0, the greater the income equality, and the closer it is to 1, the greater the income inequality. 

a. How do you interpret this regression? 

b. Suppose the Gini coefficient increases from 0.25 to 0.55. By how much does SPI go up? What does 
that mean in practice? 

c. Is the estimated slope coefficient statistically significant at the 5% level? Show the necessary 
calculations. 

d. Based on the preceding regression, can you argue that countries with greater income inequality are 
politically unstable? 


*See David N. Weil, Economic Growth, Addison Wesley, Boston, 2005, p. 392. 


Empirical Exercises 
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6.14. You are given the data in Table 6.7." Fit the following model to these data and obtain the usual 


regression statistics and interpret the results: 


= By tefg) 


100 


100:—J; 


Table 6.7 


Y; 86 79 76 69 
Xi 3 if 12 17 


62 52 51 51 48 
35 45 55 70 120 


Orla: 


To study the relationship between investment rate (investment expenditure as a ratio of the GDP) and 


savings rate (savings as a ratio of GDP), Martin Feldstein and Charles Horioka obtained data for a 
sample of 21 countries. (See Table 6.8.) The investment rate for each country is the average rate for 
the period 1960-1974 and the savings rate is the average savings rate for the period 1960-1974. The 
variable Invrate represents the investment rate and the variable Savrate represents the savings rate.” 
a. Plot the investment rate against the savings rate. 


Table 6.8 
SAVRATE INVRATE 
Australia 0.250 0.270 
Austria 0.285 0.282 
Belgium 0.235 0.224 
Canada 0.219 0.231 
Denmark 0.202 0.224 
Finland 0.288 0.305 
France 0.254 0.260 
Germany 0.271 0.264 
Greece 0.219 0.248 
Ireland 0.190 0.218 
Italy 0.235 0.224 
Japan 0.372 0.368 
Luxembourg 0.313 0.277 
Netherlands 0.273 0.266 
New Zealand 0.232 0.249 
Norway 0.278 0.299 
Spain 0.235 0.241 
Sweden 0.241 0.242 
Switzerland 0.297 0.297 
U.K. 0.184 0.192 
U.S. 0.186 0.186 


Note: SAVRATE = Savings as a ratio of GDP. 
INVRATE = Investment expenditure as a ratio of GDP. 


*Adapted from J. Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, p. 87. Actually this is taken from an 


econometric examination of Oxford University in 1975. 


“Martin Feldstein and Charles Horioka, “Domestic Saving and International Capital Flows,” Economic Journal, vol. 90, 
June 1980, pp. 314-329. Data reproduced from Michael P. Murray, Econometrics: A Modern Introduction, Addison-Wesley, 


Boston, 2006. 
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6.16. 


b. Based on this plot, do you think the following models might fit the data equally well? 


Invrate; = B, + B2Savrate; + ti 
In Invrate; = a; + @2 In Savrate; + u; 


c. Estimate both of these models and obtain the usual statistics. 

d. How would you interpret the slope coefficient in the linear model? In the log—linear model? Is 
there a difference in the interpretation of these coefficients? 

e. How would you interpret the intercepts in the two models? Is there a difference in your interpre- 
tation? 

f. Would you compare the two 7? coefficients? Why or why not? 

g. Suppose you want to compute the elasticity of the investment rate with respect to the savings rate. 
How would you obtain this elasticity for the linear mode]? For the log—linear mode]? Note that this 
elasticity is defined as the percentage change in the investment rate for a percentage change in the 
savings rate. 

h. Given the results of the two regression models, which model would you prefer? Why? 

Table 6.9* gives the variable definitions for various kinds of expenditures, total expenditure, income, 

age of household, and the number of children for a sample of 1,519 households drawn from the 

1980-1982 British Family Expenditure Surveys. 

The actual dataset can be found on this text’s website. The data include only households with one or 
two children living in Greater London. The sample does not include self-employed or retired house- 
holds. 

a. Using the data on food expenditure in relation to total expenditure, determine which of the models 
summarized in Table 6.6 fits the data. 


Table 6.9 


List of Variables: 


wfood = budget share for food expenditure 
wfuel = budget share for fuel expenditure 
wcloth = budget share for clothing expenditure 
walc = budget share for alcohol expenditure 
wtrans = budget share for transportation expenditure 
wother = budget share for other expenditures 


totexp = total household expenditure 
(rounded to the nearest 10 U.K. pounds sterling) 
income = total net household income 
(rounded to the nearest 10 U.K. pounds sterling) 
age = age of household head 
nk = number of children 


The budget share of a commodity, say food, is defined as: 
expenditure on food 


wfood = - 
total expenditure 


*The data are from Richard Blundell and Krishna Pendakur, “Semiparametric Estimation and Consumer Demand,” Journal 
of Applied Econometrics, vol. 13, no. 5, 1998, pp. 435—462. Data reproduced from R. Carter Hill, William E. Griffiths, and 
George G. Judge, Undergraduate Econometrics, 2d ed., John Wiley & Sons, New York, 2001. 
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b. Based on the regression results obtained in (a), which model seems appropriate in the present 
instance? 

Note: Save these data for further analysis in the next chapter on multiple regression. 

6.17. Refer to Table 6.3. Find out the rate of growth of expenditure on durable goods. What is the estimated 
semielasticity? Interpret your results. Would it make sense to run a double-log regression with expen- 
diture on durable goods as the regressand and time as the regressor? How would you interpret the 
slope coefficient in this case? 

6.18. From the data given in Table 6.3, find out the growth rate of expenditure on nondurable goods and 
compare your results with those obtained from Exercise 6.17. 

6.19. Table 6.10 gives data for the U.K. on total consumer expenditure (in £ millions) and advertising 
expenditure (in £ millions) for 29 product categories.” 

a. Considering the various functional forms we have discussed in the chapter, which functional form 
might fit the data given in Table 6.10? 

b. Estimate the parameters of the chosen regression model and interpret your results. 

c. If you take the ratio of advertising expenditure to total consumer expenditure, what do you 
observe? Are there any product categories for which this ratio seems unusually high? Is there 
anything special about these product categories that might explain the relatively high expenditure 
on advertising? 

6.20. Refer to Example 3.3 in Chapter 3 to complete the following: 

a. Plot cell phone demand against purchasing power (PP) adjusted per capita income. 

b. Plot the log of cell phone demand against the log of PP-adjusted per capita income. 

c. What is the difference between the two graphs? 

d. From these two graphs, do you think that a double-log model might provide a better fit to the data 
than the linear model? Estimate the double-log model. 

e. How do you interpret the slope coefficient in the double-log model? 

f. Is the estimated slope coefficient in the double-log model statistically significant at the 5% level? 

g. How would you estimate the elasticity of cell phone demand with respect to PP- adjusted income 
for the linear model given in Eq. (3.7.3)? What additional information, if any, do you need? Call 
the estimated elasticity the income elasticity. 

h. Is there a difference between the income elasticity estimated from the double-log model and that 
estimated from the linear model? If so, which model would you choose? 

6.21. Repeat Exercise 6.20 but refer to the demand for personal computers given in Eq. (3.7.4). Is there a 
difference between the estimated income elasticities for cell phones and personal computers? If so, 
what factors might account for the difference? 

6.22. Refer to the data in Table 3.3. To find out if people who own PCs also own cell phones, run the 
following regression: 


CellPhone; = 6; + B2PCs; + ui 


Estimate the parameters of this regression. 
Is the estimated slope coefficient statistically significant? 
c. Does it matter if you run the following regression? 


a 


> 


PCs; = a, + a Cellphone; + u; 


d. Estimate the preceding regression and test the statistical significance of the estimated slope 
coefficient. 
e. How would you decide between the first and the second regression? 


*These data are from Advertising Statistics Year Book, 1996, and are reproduced from http://www. Economicswebinstitute. 
org/ecdata.htm. 
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Table 6.10 Advertising Expenditure and Total Expenditure (in £ millions) for 29 Product Categories in the U.K. 


° 
o 
“ 


WONKA UNAWHD OH 


ADEXP 


87957.00 
23578.00 
16345.00 
6550.000 
10230.00 
9127.000 


1675.000 ` 


1110.000 
3351.000 
1140.000 
6376.000 
4500.000 
1899.000 
10101.00 
3831.000 
99528.00 
15855.00 
8227.000 
54517.00 
49593.00 
39664.00 


327.0000 


22549.00 
416422.0 
14212.00 
54174.00 
20218.00 
11041.00 
22542.00 


CONEXP 


13599.00 
4699.000 
5473.000 
6119.000 
8811.000 
1142.000 

143.0000 
138.0000 
85.00000 
108.0000 
307.0000 
1545.000 
943.0000 
369.0000 
285.0000 
1052.000 
862.0000 
84.00000 
1174.000 
2531.000 
408.0000 
295.0000 
488.0000 
19200.00 
94.00000 
5320.000 
357.0000 
159.0000 
244.0000 


RATIO 


0.006468 
0.005018 
0.002986 
0.001070 
0.001161 
0.007992 
0.011713 
0.008043 
0.039424 
0.010556 
0.020769 
0.002913 
0.002014 
0.027374 
0.013442 
0.094608 
0.018393 
0.105083 
0.046437 
0.019594 
0.097216 
.0.001108 
0.046207 
0.021689 
0.151191 
0.010183 
0.056633 
0.069440 
0.092385 


Note: ADEXP = Advertising expenditure (£, millions) 
CONEXP = Total consumer expenditure (£, millions) 


l (@} 
10. (b) 
19. (d) 


Key to Multiple Choice Questions 


2. (b) 3. (c) 
11. (a) BA E 
20. (b) 


4. (a) 
13. (a) 


5. (a) 6. (a) 
14. (c) 15. (d) 


16. (b) 


7. (d) 8. (c) 
17. (a) 


9. (d) 
18. (a) 
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Appendix 6A 


6A.l Derivation of Least-Squares Estimators for 
Regression through the Origin 


We want to minimize 
a = 0 — Xi (1) 
with respect to p>. 


Differentiating (1) with respect to bo, we obtain 


d o2 
nf =2 0% — Xi) (2) 
Setting Eq. (2) equal to zero and simplifying, we get 
Lx; 
fo = we (6.1.6) = (3) 


Now substituting the PRF: Y, = 6X; + u; into this equation, we obtain 
= > Xi(B2X; + ui) 
v4; 
Sa, (4) 
De, 


= f2 + 
[Note: E(ĝ2) = p2-] Therefore, 


(5) 


E(h- pry =e | Ae] 


ay 
Expanding the right-hand side of Eq. (5) and noting that the X; are nonstochastic and the u; are homoscedastic and uncor- 
related, we obtain 


2 


es 
Lx 


var (2) = E(p2 — 2)? = (6.1.7) = (6) 


Incidentally, note that from Eq. (2) we get, after equating it to zero, 
ba u;X; =0 (7) 


From Appendix 3A, Section 3A.1, we see that when the intercept term is present in the model, we get in addition to 
Eq. (7) the condition ` ; = 0. From the mathematics just given it should be clear why the regression through the origin 
model may not have the error sum, ` ġ;, equal to zero. 

Suppose we want to impose the condition that $` 2; = 0. In that case we have 


> Mn By)” Xiot)> ay 


(8) 
= py a Xe since >. ii; = 0 by construction 
This expression then gives 
N Y; 
po. = a 
; (9) 


2 
Y _ mean value of Y 
X mean value of X 
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But this estimator is not the same as Eq. (3) above or Eq. (6.1.6). And since the Bo of Eq. (3) is unbiased (why?), the 
bo of Eg. (9) cannot be unbiased. 

The upshot is that, in regression through the origin, we cannot have both > a,X, and }_ 4; equal to zero, as in the 
conventional model. The only condition that is satisfied is that }/ 4; X; is zero. 

Recall that 


ee i (2.6.3) 
Summing this equation on both sides and dividing by N, the sample size, we obtain 
P= f+ (10) 
Since for the zero intercept model }_ ô; and, therefore ú, need not be zero, it then follows that 
aai (11) 


that is, the mean of actual Y values need not be equal to the mean of the estimated Y values; the two mean values are 
identical for the intercept-present model, as can be seen from Eq. (3.1.10). 

It was noted that, for the zero-intercept model, 7 can be negative, whereas for the conventional model it can never be 
negative. This condition can be shown as follows. 

Using Eq. (3.5.5a), we can write 


2 R88, X 
TE Sy? 


Now for the conventional, or intercept-present, model, Eq. (3.3.6) shows that 


Rss= = Dy? A < Do? a3) 


unless bo is zero (i.e., X has no influence on Y whatsoever). That is, for the conventional model, RSS = TSS, or. r can 
never be negative, 
For the zero-intercept model it can be shown analogously that 


RSS= Ju e i (14) 
(Note: The gums of squares of Y and X are not mean-adjusted.) Now there is no guarantee that this RSS will always be 


less than Ù y? E= 2 _ NY? (the TSS), which suggests that RSS can be greater than TSS, implying that re as conven- 
tionally defined, can be negative. Incidentally, notice that in this case RSS will be greater than TSS if ps DEFN 


(12) 


6A.2 Proof that a Standardized Variable Has Zero Mean and 
Unit Variance 


Consider the random variable (r.v.) Y with the (sample) mean value of Y and (sample) standard deviation of S,. Define 
pe Go 
I Sy á 


(15) 


Hence Y;* is a standardized variable. Notice that standardization involves a dual operation: (1) change of the origin, which 
is the numerator of Eq. (15), and (2) change of scale, which is the denominator. Thus, standardization involves both a 
change of the origin and change of scale. 

Now 
1 EU-P 


YA 
Sy n 


=u (16) 


since the sum of deviation of a variable from its mean value is always zero. Hence the mean value of the standardized 
value is zero. (Note: We could pull out the S, term from the summation sign because its value is known.) 
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Now 
2 _ wh -¥P/(n-1) 
Sy — >D 7 
l : r 
= 2 
Tun Rt (17) 
(n — 1)S; 
~ (n-1)S2 
Note that 
_ 72 
-e 
; n—1 
which is the sample variance of Y. 
6A.3 Logarithms 
Consider the numbers 5 and 25. We know that 
25 = 5? (18) 


We say that the exponent 2 is the logarithm of 25 to the base 5. More formally, the logarithm of a number (e.g., 25) to a 
given base (e.g.. 5) is the power (2) to which the base (5) must be raised to obtain the given number (25). 
More generally, if 
Y =b* (b> 0) (19) 
then 
log, Y = X (20) 


In mathematics the function (19) is called an exponential function and the function (20) is called the logarithmic function. 
As is clear from Eqs. (19) and (20), one function is the inverse of the other function. 

Although any (positive) base can be used, in practice, the two commonly used bases are 10 and the mathematical 
number e = 2.71828 .... 

Logarithms to base 10 are called common logarithms. Thus, 


logo 100 =2 log) 930 =% 1.48 
That is, in the first case, 100 = 107 and in the latter case, 30 ~ 10148. 
Logarithms to the base e are called natural logarithms. Thus, 
log, 100 ~ 4.6051 and log,30 œ~ 3.4012 
All these calculations can be done routinely on a hand calculator. 
By convention, the logarithm to base 10 is denoted by the letters log and to the base e by In. Thus, in the preceding 
example, we can write log 100 or log 30 or In 100 or In 30. 
There is a fixed relationship between the common log and natural log, which is 
In X = 2.3026 log X (21) 
That is, the natural log of the number X is equal to 2.3026 times the log of X to the base 10. Thus, 
In 30 = 2.3026 log 30 = 2.3026(1.48) = 3.4012 (approx.) 


as before. Therefore, it does not matter whether one uses common or natural logs. But in mathematics the base that is 
usually preferred is e, that is, the natural logarithm. Hence, in this book all logs are natural logs, unless stated explicitly. 
Of course, we can convert the log of a number from one basis to the other using Eq. (21). 
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Keep in mind that logarithms of negative numbers are not defined. Thus, the log of (—5) or the In (—5) is not defined. 
Some properties of logarithms are as follows: If A and B are any positive numbers, then it can be shown that: 


ie i in (4 X B)=lnA +InB (22) 
That is, the log of the product of two (positive) numbers A and B is equal to the sum of their logs. 
25 In (4/B) = ln A — lIn B (23) 
That is, the log of the ratio of A to B is the difference in the logs of A and B. 
3: ln(4 + B) #1n A +lnB i l (24) 
That is, the log of the sum or difference of A and B is not equal to the sum or difference of their logs. 
4. ln (4 = kin A (25) 
That is, the log of A raised to power k is k times the log of A. 
Si Ine=1 (26) 
That is, the log of e to itself as a base is 1 (as is the log of 10 to the base 10). 
6. Inl =0 (27) 
That is, the natural log of the number 1 is zero (as is the common log of number 1). 
Ta IOFS Im 
l a (28) 
dX EX 


That is, the rate of change (i.e., the derivative) of Y with respect to X is 1 over X. The exponential and (natural) 
logarithmic functions are depicted in Figure 6A.1. 


Y Y =e¥ X=InY 


(a) (b) 


Figure 6A.1 Exponential and logarithmic functions: (a) Exponential function; (b) logarithmic function. 


Although the number whose log is taken is always positive, the logarithm of that number can be positive as well as 
negative. It can be easily verified that if 


0<Y<1 then InY<0O 
l then InY=0 


Y>1 then In Y>0 
Also note that although the logarithmic curve shown in Figure 6A.1(b) is positively sloping, implying that the larger 
the number is, the larger its logarithmic value will be. the curve is increasing at a decreasing rate (mathematically, the 
second derivative of the function is negative). Thus, In(10) = 2.3026 (approx.) and In(20) = 2.9957 (approx.). That is, if a 
number is doubled, its logarithm does not double. 
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This is why the logarithm transformation is called a nonlinear transformation. This can also be seen from Equation 
(28), which notes that if Y = In X, d¥/dX= 1/X. This means that the slope of the logarithmic function depends on the value 
of X; that is, it is not constant (recall the eo of linearity in the variable). 

Logarithms and percentages: Since Am0 , or d(In X) = gda , for very small changes the change in In X is equal 
to the relative or proportional change in X. In ERS, if the ehang in X is reasonably small, the preceding relationship 
can be written as the change in In X ~ to the relative change in X, where ~ means approximately. 

Thus, for small changes, 


(X: EA) 


(in X, — Ìn X1) & VE 


=relative change in X 


6A.4 Growth Rate Formulas 


Let the variable Y be a function of time. Y = fir), where ¢ denotes time. The instantaneous (i.e., a point in time) rate of 
growth of Y, gyis defined as 


Note that if we multiply gy by 100, we get the percent rate of growth. where — is the rate of change of Y with respect 
to time. 

Now if we let In Y = Inf(ż), where In stands for the natural logarithm, then 

dmY _1ldaY (30) 
dtt Y dt 

This is the same as Eq. (29). 

Therefore, logarithmic transformations are very useful in computing growth rates, especially if Y is a function of some 
other time-dependent variables, as the following example will show. Let 


HSNZ (31) 
where Y is nominal GDP, X is real GDP, and Z is the (GDP) price deflator. In words, the nominal GDP is real GDP multi- 


plied by the (GDP) price deflator. All these variables are functions of time, as they vary over time. 
Now taking logs on both sides of Eq. (31), we obtain: 


InY=nX¥+inZ (32) 
Differentiating Eq. (32) with respect to time, we get 


lay Ea Wax dZ (33) 


Ydt Xdi dez Z dt 
that is, gy = gy + 8z, where g denotes growth rate. 

In words, the instantaneous rate of growth of Y is equal to the sum of the instantaneous rate of growth of X plus the 
instantaneous rate of growth of Z. In the present example, the instantaneous rate of growth of nominal GDP is equal to 
the sum of the instantaneous rate of growth of real GDP and the instantaneous rate of growth of the GDP price deflator. 

More generally, the instantaneous rate of growth of a product is the sum of the instantaneous rates of growth of its 
components. This can be generalized to the product of more than two variables. 

In similar fashion, if we have 


idy _1daX laz (35) 
Kidi X dt Zedi 


that is, gy = gy — 87. In other words, the instantaneous rate of growth of Y is the difference between the instantaneous rate 
of growth of X minus the instantaneous rate of growth of Z. Thus if Y = per capita income, X = GDP and Z = population, 
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then the instantaneous rate of growth of per capita income is equal to the instantaneous rate of growth of GDP minus the 
instantaneous rate of growth of population. 

Now let Y = X + Z. What is the rate of growth of Y? Let Y = total employment, X = blue collar employment, and 
Z = white collar employment. Since 


In(X + Z) 4 In X + InY, 
it is not easy to compute the rate of growth of Y, but with some algebra, it can be shown that 
X Z i l 
= zyz + yy 78 l (36) 


That is, the rate of growth of a sum‘s a weighted average of the rates of growth of its components. For our example. 
the rate of growth of total employment is a weighted average of the rates of growth of white collar employment and blue 
collar employment, the weights being the share of each component in total employment. 


gY 


6A.5 Box-Cox Regression Model 


Consider the following regression model 


YÀ = Pi + Xi tu Y> 0 (37) 
where A (Greek lamda) is a parameter, which may be negative, zero, or positive. Since Y is raised to the power À. we will 
get several transformations of Y, depending on the value of À. 

Equation (37) is known as the Box-Cox regression model, named after the statisticians Box and Cox.! Depending on 
the value of A, we have the following regression models, which are shown in tabular form: 


Value of 2 Regression Model 

1 Yi = By + B2X%j + yj 
2 Y? = Bi + b2 Xi + ui 
0.5 VY; = Bi + B2Xj + ui 
0 In Y; = Bi + B2X%i + üi 

1 
—0.5 WV, =p] + ß2Xi + U; 

1 
—1.0 y = Bi + B2Xi +u; 


w 


As you can see, linear and log-linear models are special cases of the Box-Cox family of transformations. 

Of course, we can apply such transformations to the X variable(s) also. It is interesting to note that when A is zero, we 
get the log-transformation of Y. The proof of this is slightly involved and is best left for the references. (Calculus-minded 
readers will have to recall the l’ Hopital Rule.) 

But how do we actually determine the appropriate value of A in a given situation? We cannot estimate Eq. (37) directly, 
for it involves not only the regression parameters 8, and 8, but also A, which enters nonlinearly. But it can be shown that 
we can use the method of maximum likelihood to estimate all these parameters. Regression packages exist to do just that. 

We will not pursue this topic here because the procedure is somewhat involved. 

However, we can proceed by trial and error. Choose several values of A, transform Y accordingly. run regression 


(37), and obtain the residual sum of squares (RSS) for each transformed regression. Choose the value of A that gives the 
minimum RSS.” 


1G.E.P. Box and D.R. Cox, “An Analysis of Transformations,” Journal of the Royal Statistical Society, B26, 1964, pp. 211-243. 


?For an accessible discussion, refer to John Neter, Michael Kutner, Christopher Nachtsheim, and William Wasserman, Applied 
Linear Regression Models, 3rd ed., Richard D. Irwin, Chicago, 1996. 


CHAPTER 


Multiple Regression Analysis: 
The Problem of Estimation 


The two-variable model studied extensively in the previous chapters is often inadequate in practice. In our 
consumption—income example (Example 3.1), for instance, it was assumed implicitly that only income X is 
related to consumption Y. But economic theory is seldom so simple for, besides income, a number of other 
variables are also likely to affect consumption expenditure. An obvious example is wealth of the consumer. 
As another example, the demand for a commodity is likely to depend not only on its own price but also on the 
prices of other competing or complementary goods, income of the consumer, social status, etc. Therefore, we 
need to extend our simple two-variable regression model to cover models involving more than two variables. 
Adding more variables leads us to the discussion of multiple regression models, that is, models in which the 
dependent variable, or regressand, Y depends on two or more explanatory variables, or regressors. 

The simplest possible multiple regression model is three-variable regression, with one dependent variable 
and two explanatory variables. In this and the next chapter we shall study this model. Throughout, we are 
concerned with multiple linear regression models, that is, models linear in the parameters; they may or may 
not be linear in the variables. 


7.1 The Three-Variable Model: Notation and Assumptions 


Generalizing the two-variable population regression function (PRF) Eq. (2.4.2), we may write the three- 
variable PRF as 


Y; = By + BoX2; + B3X3; + ui (7.1.1) 


where Y is the dependent variable, X, and X, the explanatory variables (or regressors), u the stochastic 
disturbance term, and i the ith observation; in case the data are time series, the subscript t will denote the th 


observation. ! 


1For notational symmetry, Eq. (7.1.1) can also be written as 
Yi = Br X15 + B2X2i + B3 X3i + Ui 
with the provision that X,;= 1 for all i. 
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In Eg. (7.1.1) 6; is the intercept term. As usual, it gives the mean or average effect on Y of all the variables 
excluded from the model, although its mechanical interpretation is the average value of Y when X, and X, are 
set equal to zero. The coefficients B, and £; are called the partial regression coefficients, and their meaning 
will be explained shortly. 

We continue to operate within the framework of the classical linear regression model (CLRM) first 
introduced in Chapter 3. As a reminder, we assume the following: 
eee aaa IaaaaaaaasasasaaaaaaaauaIaaaaaasasalalalttllllltlutlaalllllllalllMMMlMlM 

Assumptions 
_1. Linear regression model, or linear in the parameters. (7.1.2) 


2. Fixed X values or X values independent of the error term. Here, this means 
we require zero covariance between u; and each X variables. 


cov (u, X2) = cov (u, X3) = 0 (7.1.3)? 
3. Zero mean value of disturbance u; 
E(u;| Xz, X3) = 0 for each i (7.1.4) 
4. Homoscedasticity or constant variance of u;. 
var (u) = o? (7.1.5) 
5. No autocorrelation, or serial correlation, between the disturbances. 
cov (u, uj) = 0 i#j (7.1.6) 
6. The number of observations n must be greater than the number of 
parameters to be estimated, which is 3 in our current case. (7.1.7) 
7: There must be variation in the values of the X variables. : ‘ (7.1.8) 


We will also address two other requirements. 
8. No exact collinearity between the X variables. 
No exact linear relationship between X, and X; (7.1.9) 
In Section 7.7, we will spend more time discussing the final assumption. 
9. There is no specification bias. 
The model is correctly specified. (7.1.10) 


The rationale for assumptions (7.1.2) through (7.1.10) is the same as that discussed in Section 3.2. 
Assumption (7.1.9), that there is no exact linear relationship between X, and X;, is technically known as the 
assumption of no collinearity or no multicollinearity if more than one exact linear relationship is involved. 

Informally, no collinearity means none of the regressors can be written as exact linear combinations of the 
remaining regressors in the model. 

Formally, no collinearity means that there exists no set of numbers, A, and A;, not both zero such that 


A2X2; + A3X3; = 0 (7.1.11) 
If such an exact linear relationship exists, then X, and X, are said to be collinear or linearly dependent. On the 
other hand, if Eq. (7.1.11) holds true only when A, = A; =0, then X, and X, are said to be linearly independent. 
Thus, if 
Xz = —4X3; o X2, + 4X3; = 0 (7.1.12) 
the two variables are linearly dependent, and if both are included in a regression model, we will have perfect 
collinearity or an exact linear relationship between the two regressors. 


“This assumption is automatically fulfilled if X, and X; are nonstochastic and Eq. (7.1.4) holds. 
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Although we shall consider the problem of multicollinearity in depth in Chapter 10, intuitively the logic 
behind the assumption of no multicollinearity is not too difficult to grasp. Suppose that in Eq. (7.1.1) Y, X>, 
and X, represent consumption expenditure, income, and wealth of the consumer, respectively. In postulating 
that consumption expenditure is linearly related to income and wealth, economic theory presumes that wealth 
and income may have some independent influence on consumption. If not, there is no sense in including both 
income and wealth variables in the model. In the extreme, if there is an exact linear relationship between 
income and wealth, we have only one independent variable, not two, and there is no way to assess the separate 
influence of income and wealth on consumption. To see this clearly, let X}; = 2X,, in the consumption- 
income—wealth regression. Then the regression (7.1.1) becomes 


Y; = Bi + b2Xzi + B3(2X2;) + ui 
= pı + (B2 + 2p3)Xzi + ui (7.1.13) 


= By + 0X9; + u; 
where æ = (B, + 28;). That is, we in fact have a two-variable and not a three-variable regression. Moreover, 
if we run the regression (7.1.13) and obtain a, there is no way to estimate the separate influence of X, (= B) 
and X; ( = B3) on Y, for a gives the combined influence of X, and X, on Y.* 

In short, the assumption of no multicollinearity requires that in the PRF we include only those variables 
that are not exact linear functions of one or more variables in the model. Although we will discuss this topic 
more fully in Chapter 10, a couple of points may be noted here. 

First, the assumption of no multicollinearity pertains to our theoretical (i.e., PRF) model. In practice, 
when we collect data for empirical analysis there is no guarantee that there will not be correlations among the 
regressors. As a matter of fact, in most applied work it is almost impossible to find two or more (economic) 
variables that may not be correlated to some extent, as we will show in our illustrative examples later in the 
chapter. What we require is that there be no exact linear relationships among the regressors, as in Eq. (7.1.12). 

Second, keep in mind that we are talking only about perfect linear relationships between two or more 
variables. Multicollinearity does not rule out nonlinear relationships between variables. Suppose X3; = X3,. 
This does not violate the assumption of no perfect collinearity, as the relationship between the variables here 
is nonlinear. 


7.2 Interpretation of Multiple Regression Equation 


Given the assumptions of the classical regression model, it follows that, on taking the conditional expectation 
of Y on both sides of Eq. (7.1.1), we obtain 
E(Y; | Xai, X3i) = Bi + BoX2; + Bi Xs: (7.2.1) 


In words, Eq. (7.2.1) gives the conditional mean or expected value of Y conditional upon the given or 
fixed values of X, and X3. Therefore, as in the two-variable case, multiple regression analysis is regression 
analysis conditional upon the fixed values of the regressors, and what we obtain is the average or mean value 
of Y or the mean response of Y for the given values of the regressors. 


7.3 The Meaning of Partial Regression Coefficients 


As mentioned earlier, the regression coefficients B, and B, are known as partial regression or partial slope 
coefficients. The meaning of partial regression coefficient is as follows: B, measures the change in the mean 


3Mathematically speaking, a = (82 + 283) is one equation in two unknowns and there is no unique way of estimating 6, and 
B3 from the estimated a. 


206 Basic Econometrics 


value of Y, E(Y), per unit change in X,, holding the value of X; constant. Put differently, it gives the “direct” 
or the “net” effect of a unit change in X, on the mean value of Y, net of any effect that X, may have on mean 
Y. Likewise, B, measures the change in the mean value of Y per unit change in X;, holding the value of X, 
constant.’ That is, it gives the “direct” or “net” effect of a unit change in X; on the mean value of Y, net of any 
effect that X, may have on mean Yà 

How do we actually go about holding the influence of a regressor constant? To explain this, let us revert 
to our child mortality example (Example 6.6). Recall that in that example, Y = child mortality (CM), X» = 
per capita GNP (PGNP), and X, = female literacy rate (FLR). Let us suppose we want to hold the influence 
of FLR constant. Since FLR may have some effect on CM as well as PGNP in any given concrete data, what 
we can do is remove the (linear) influence of FLR from both CM and PGNP by running the regression of CM 
on FLR and of PGNP on FLR separately and then looking at the residuals obtained from these regressions. 
Using the data given in Table 6.4, we obtain the following regressions: 


CM, = 263.8635 — 2.3905 FLR; + a4; 


(7.3.1) 
se = (12.2249) (0.2133) r? = 0.6695 
where ii); represents the residual term of this regression. 
PGNP, = —39.3033 + 28.1427 FLR, + iia; 
bee (7.3.2) 
se = (734.9526) (12.8211) 7? = 0.0721 
where i; represents the residual term of this regression. 
Now 
ij; = (CM; — 263.8635 + 2.3905 FLR;) ` (7.3.3) 
represents that part of CM left after removing from it the (linear) influence of FLR. Likewise, 
ûz; = (PGNP; + 39.3033 — 28.1427 FLR; ) l (7.3.4) 


represents that part of PGNP left after removing from it the (linear) influence of FLR. 

Therefore, if we now regress #1; on ûz, which are “purified” of the (linear) influence of FLR, wouldn’t 
we obtain the net effect of PGNP on CM? That is indeed the case (see Appendix 7A, Section 7A.2). The 
regression results are as follows: 


a = —0.00562>; md 


se= (0.0019) r? = 0.1152 
Note: This regression has no intercept term because the mean value of the OLS residuals ĉ,; and “>; is zero. 
(Why?) 

The slope coefficient of —0.0056 now gives the “true” or net effect of a unit change in PGNP on CM or the 
true slope of CM with respect to PGNP. That is, it gives the partial regression coefficient of CM with respect 
to PGNP, B;. 

Readers who want to get the partial regression coefficient of CM with respect to FLR can replicate the 
above procedure by first regressing CM on PGNP and getting the residuals from this regression (ù1;), then 
regressing FLR on PGNP and obtaining the residuals from this regression (2), and then regressing ĝi; on 
u2;. I am sure readers get the idea. 


(7.3.5) 


‘The calculus-minded reader will notice at once that £, and 8; are the partia! derivatives of E(Y|X>, X3) with respect to X, and X3. 


5 . . . « . 
Incidentally, the terms holding constant, controlling for, allowing or accounting for the influence of, correcting the influence of, 
and sweeping out the influence of are synonymous and will be used interchangeably in this text. 


Multiple Regression Analysis: The Problem of Estimation 207 


Do we have to go through this multistep procedure every time we want to find out the true partial regression 
coefficient? Fortunately, we do not have to do that, for the same job can be accomplished fairly quickly and 
routinely by the OLS procedure discussed in the next section. The multistep procedure just outlined is merely 
for pedagogic purposes to drive home the meaning of “partial” regression coefficient. 


7.4 OLS and ML Estimation of the Partial Regression Coefficients 


To estimate the parameters of the three-variable regression model (7.1.1), we first consider the method of 
ordinary least squares (OLS) introduced in Chapter 3 and then consider briefly the method of maximum 
likelihood (ML) discussed in Chapter 4. 


OLS Estimators 


To find the OLS estimators, let us first write the sample regression function (SRF) corresponding to the PRF 
of Eq. (7.1.1) as follows: 


Y; = Bi + Ê2Xai + BsX3i + ĉi (7.4.1) 


where w, is the residual term, the sample counterpart of the stochastic disturbance term u;. 
As noted in Chapter 3, the OLS procedure consists of choosing the values of the unknown parameters so 
that the residual sum of squares (RSS) Y- ĝ? is as small as possible. Symbolically, 


min ) | i? = PE — Êi — -Xz — B3X3i)° (7.4.2) 


where the expression for the RSS is obtained by simple algebraic manipulations of Eq. (7.4.1). 

The most straightforward procedure to obtain the estimators that will minimize Eq. (7.4.2) is to differen- 
tiate it with respect to the unknowns, set the resulting expressions to zero, and solve them simultaneously. As 
shown in Appendix 7A, Section 7A.1, this procedure gives the following normal equations [cf. Eqs. (3.1.4) 
and (3.1.5)]: 


Y = Bi + ÊX + Xs (7.4.3) 
D Y; Xu = By A + Ê yxy + Bs YO Xai X3i (7.4.4) 
Si Xa = Êi D> Xai + Be D> Xai Xi + Bs D> XG, (7.4.5) 


From Eq. (7.4.3) we see at once that 
ĝi = F — BX — ÊX; l (7.4.6) 
which is the OLS estimator of the population intercept 6}. 


Following the convention of letting the lowercase letters denote deviations from sample mean values, one 
can derive the following formulas from the normal equations (7.4.3) to (7.4.5): 


A fe (>> yix20) (> x3;) = os ¥ixai) =) (7.4.7) 
(Exi) (È x3) — (È xxx) 


6This estimator is equal to that of Eq. (7.3.5), as shown in App. 7A, Sec. 7A.2. 
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a m (es yix3i) ODESA (~~ yix2i) (Yo X2;X3;) (7.4.8) 
Ts 2 eve 
(E x2) (2 x3;) — (È xaix31) 
which give the OLS estimators of the population partial regression coefficients B, and 83, respectively. 
In passing, note the following: (1) Equations (7.4.7) and (7.4.8) are symmetrical in nature because one 


can be obtained from the other by interchanging the roles of X, and X3; (2) the denominators of these two 
equations are identical; and (3) the three-variable case is a natural extension of the two-variable case. 


Variances and Standard Errors of OLS Estimators 


Having obtained the OLS estimators of the partial regression coefficients, we can derive the variances and 
standard errors of these estimators in the manner indicated in Appendix 3A.3. As in the two-variable case, 
we need the standard errors for two main purposes: to establish confidence intervals and to test statistical 
hypotheses. The relevant formulas are as follows:’ 


bX) Wx Se NG a N T 
2 dL X3 3 2o X; 2X3 2 x2iX3 w (7.4.9) 


var (B1) = | - + 
l E Yao D = Oe 
se (Bi) = +y var (Ai) | . se (7.4.10) 


ey 2 


var (Ê) = — 5 i0 (7.4.11) 
(DNE Exx) 
or, equivalently, 
R o? 
var (b2) = (7.4.12) 


3x3; (1 139) 


where r3; is the sample coefficient of correlation between X, and X; as defined in Chapter 3.8 


se (A) = +y var (62) (7.4.13) 


BES 2 


var (B3) = ee, L O 7.4.14 
(E)E) (Daas) i 
or, equivalently, 
ps) a 
Val ($3) ee 
“URC o a 


se (Ês) = +y var (As) (7.4.16) 


’The derivations of these formulas are easier using matrix notation. Advanced readers may refer to Appendix C. 
8Using the definition of r given in Chapter 3, we have 


2 (Eza) 


93 So ae 
D x5) D XG; 
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—r 307 


C0 (he - (7.4.17) 
(1 ~ 133) DSi 33; 


In all these formulas g? is the (homoscedastic) variance of the population disturbances u;. 
Following the argument of Appendix 3A, Section 3A.5, the reader can verify that an unbiased estimator 
of a” is given by 


TI (7.4.18) 


Note the similarity between this estimator of o° and its two-variable counterpart [6° = (X` ù?)/(n — 2)]. The 
degrees of freedom are now (n — 3) because in estimating }` a? we must first estimate 8), By, and B3, which 
consume 3 df. (The argument is quite general. Thus, in the four-variable case the df will be n — 4.) 

The estimator ô? can be computed from Eq. (7.4.18) once the residuals are available, but it can also be 
obtained more readily by using the following relation (for proof, see Appendix 7A, Section 7A.3): 


> = Yy- Bo yma — Ês Y visi (7.4.19) 


which is the three-variable counterpart of the relation given in Eq. (3.3.6). 


Properties of OLS Estimators 


The properties of OLS estimators of the multiple regression model parallel those of the two-variable model. 
Specifically: à 

1. The three-variable regression line (surface) passes through the means Y, Xə, and X3, which is evident 
from Eq. (7.4.3) (cf. Eq. [3.1.7] of the two-variable model). This property holds generally. Thus in the 
k-variable linear regression model (a regressand and [k — 1] regressors) 


Y; = Bi + BoX2; + B3X3i +--+ + BeXei + ui (7.4.20) 
we have 
By = ¥ — BoX2 — BsX3 — +++ — BX (7.4.21) 


2. The mean value of the estimated Y; (= ¥) is equal to the mean value of the actual Y,, which is easy to 
prove: 


Ê, = By + Ê2Xzi + BsX3i 
= (Ë — pyX, — BsX3) + BoXoi + BsX3i (Why?) 
= Ü + Bo( Xo; — X2) + Bs(X3; — X3) (7.4.22) 
= ¥ + Boxx + Bsx3i 
whereas usual small letters indicate values of the variables as deviations from their respective means. 
Summing both sides of Eq. (7.4.22) over the sample values and dividing through by the sample size n 
gives Y = Y. (Note: Y. xz: = J x3; = 0. Why?) Notice that by virtue of Eq. (7.4.22) we can write 
ji = Box2i aR B3x3i (7.4.23) 
where ĵ; = (Ê; — Y). 
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Therefore, the SRF (7.4.1) can be expressed in the deviation form as 
WN=HV+G = Box0i + B3x3; + ti (7.4.24) 

DDD ii = 0, which can be verified from Eq. (7.4.24). (Hint: Sum both sides of Eq. [7.4.24] over the 
sample values.) 

4. The residuals ĉ; are uncorrelated with X,, and X;,, that is, $` ù; Xz; = }_ ú; X3; = 0 (see Appendix 7A.1 
for proof). 

5. The residuals i; are uncorrelated with Y;; that is, uj Î =0. Why? (Hint: Multiply Eq. [7.4.23] on 
both sides by ; and sum over the sample values.) 

6. From Egs. (7.4.12) and (7.4.15) it is evident that as r,3, the correlation coctiaten! between Xe and X3, 
increases toward 1, the variances of Ê, and A; increase for given values of o° and }_ x3, or J. x3,. In the 
limit, when r,, = 1 (i.e., perfect collinearity), these variances become infinite. The implications of this will be 
explored fully in Chapter 10, but intuitively the reader can see that as r}, increases it is going to be increas- 
ingly difficult to know what the true values of B, and £, are. (More on this in the next chapter, but refer to 
Eq. ABIS 

7. It is also clear from Eqs. (7.4.12) and (7.4.15) that fon given values of rz; and ya on Saul 
variances of the OLS pa are directly proportional to o”; that is, they increase as a” increases. Sei, 
for given values of o? and r}; the variance of bp is inversely proportional to > x3,; that is, the greater the 
variation in the sample values of X,, the smaller the variance of bo and therefore 6, can be estimated more 
precisely. A similar statement can be made about the variance of 3. 

8. Given the assumptions of the classical linear regression model, which are spelled out in Section 7.1, 
one can prove that the OLS estimators of the partial regression coefficients not only are linear and unbiased 
but also have minimum variance in the class of all linear unbiased estimators. In short, they are BLUE. Put 
differently, they satisfy the Gauss—Markov theorem. (The proof parallels the two-variable case proved in 
Appendix 3A, Section 3A.6 and will be presented more compactly using matrix notation in Appendix C.) 


Maximum Likelihood Estimators 


We noted in Chapter 4 that under the assumption that u;, the population disturbances, are normally distributed 
with zero mean and constant variance a”, the maximum likelihood (ML) estimators and the OLS estimators of 
the regression coefficients of the two-variable model are identical. This equality extends to models containing 
any number of variables. (For proof, see Appendix 7A, Section 7A.4.) However, this is not true of the 
estimator of g^. It can be shown that the ML estimator of a” is X ù?/n regardless of the number of variables 
in the model, whereas the OLS estimator of o° is X- 4?/(n — 2) in the two-variable case. X a7 /(n — 3) in the 
IEE variable case, and }> û?/(n — k) in the case of the k-variable model (7.4.20). In short, the OLS estimator 
of a” takes into account the number of degrees of freedom, whereas the ML estimator does not. Of course, if 
n is very large, the ML and OLS estimators of o° will tend to be close to each other. (Why?) 


7.5 The Multiple Coefficient of Determination R? 
and the Multiple Coefficient of Correlation R 


In the two-variable case we saw that r° as defined in Eq. (3.5.5) measures the goodness of fit of the regression 
equation; that is, it gives the proportion or percentage of the total Variation in the dependent variable Y 
explained by the (single) explanatory variable X. This notation of r° can be easily extended to regression 
models containing more than two variables. Thus, in the three-variable model we would like to know the 
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proportion of the variation in Y explained by the variables X, and X, jointly. The quantity that gives this 
rr is known as the multiple coefficient of determination and is denoted by R°; conceptually it is 
akin to r^. 


To derive R”, we may follow the derivation of r given in Section 3.5. Recall that 
Y; = By + Ê2Xzi + BsX3; + ti 
a. (7.5.1) 
= Y; +4; 
where Ý, is the estimated value of Y; from the fitted regression line and is an estimator of true E(Y, | X,, X3,). 


fi 


Upon shifting to lowercase letters to indicate deviations from the mean values, Eq. (7.5.1) may be written as 


yi = faxi + Bx + 0; 


ip. `. (7.5.2) 
= ji F ui 
Squaring Eq. (7.5.2) on both sides and summing over the sample values, we obtain 
Dae a 
(7.5.3) 


=H +> a (Why?) 


Verbally, Eq. (7.5.3) states that the total sum of squares (TSS) equals the explained sum of squares (ESS) plus 
the residual sum of squares (RSS). Now substituting for $` ú? from Eq. (7.4.19), we obtain 


yw = Dis ar T — ĝ XO vixai — Bs Yo yix3i 


which, on rearranging, gives 


ESS: pen = bo YO vixz ais Bs X yirs l (7.5.4) 
Now, by definition 
2 _ ESS 
TSS 
(7.5.5)? 


as Ba Yo yixai + Bs Yo yixsi 
yy, 


(cf. Eq. [7.5.5] with Eq. [3.5.6]). 

Since the quantities entering Eq. (7.5.5) are generally computed routinely, R” can be computed easily. 
Note that R?, like 7’, lies between 0 and 1. If it is 1, the fitted regression line explains 100 percent of the 
variation in Y. On the other hand, if it is 0, the model does not explain any of the variation in Y. Typically, 
however, R? lies between these extreme values. The fit of the model is said to be “better” the closer R7 is to 1. 

Recall that in the two-variable case we defined the quantity r as the coefficient of correlation and indicated 
that it measures the degree of (linear) association between two variables. The three-or-more-variable analogue 
of r is the coefficient of multiple correlation, denoted by R, and it is a measure of the degree of association 
between Y and all the explanatory variables jointly. Although r can be positive or negative, R is always taken 
to be positive. In practice, however, R is of little importance. The more meaningful quantity is R? 


?Note that R? can also be computed as follows: 
E RSS _ i Lir GEB) 
~ N oor C 
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Before proceeding further, let us note the following relationship between R? and the variance of a partial 
regression coefficient in the k-variable multiple regression model given in Eq. (7.4.20): 


a 7 o? 1 
var (B;) = Sx? E (7.5.6) 
J J 


where Ê; is the partial regression coefficient of regressor X; and R is the R? in the regression of X; on the 
remaining (k — 2) regressors. (Note: There are [k — 1] regressors in the k-variable regression model.) Although 
the utility of Eq. (7.5.6) will become apparent in Chapter 10 on multicollinearity, observe that this equation 
is simply an extension of the formula given in Eq. (7.4.12) or Eq. (7.4.15) for the three-variable regression 


model, one regressand and two regressors. 


7.6 An Illustrative Example 


Example 7.1 Child Mortality in Relation to per Capita GNP and Female Literacy Rate 


In Chapter 6 we considered the behavior of child mortality (CM) in relation to per capita GNP (PGNP). There 
we found that PGNP has a negative impact on CM, as one would expect. Now let us bring in female literacy 
as measured by the female literacy rate (FLR). A priori, we expect that FLR too will have a negative impact on 
CM. Now when we introduce both the variables in our model, we need to net out the influence of each of 
the regressors. That is, we need to estimate the (partial) regression coefficients of each regressor. Thus our 
model is: 


CM; = fi + B2PGNP; + 63FLR; + uj. : (7.6.1) 


The necessary data are given in Table 6.4. Keep in mind that CM is the number of deaths of children under 
five per 1000 live births, PGNP is per capita GNP in 1980, and FLR is measured in percent. Our sample consists 
of 64 countries. 

Using the EViews6 statistical package, we obtained the following results: 


CM, = 263.6416 — 0.0056 PGNP; — 2.2316 FLR; 


se = (11.5932) (0.0019) (0.2099) R? = 0.7077 w (7.6.2) 


R? = 0.6981 * 
where figures in parentheses are the estimated standard errors. Before we interpret this regression, observe 
the partial slope coefficient of PGNP, namely, -0.0056. Is it not precisely the same as that obtained from the 
three-step procedure discussed in the previous section (see Eq. [7.3.5])? But should that surprise you? Not 
only that, but the two standard errors are precisely the same, which is again unsurprising. But we did so 
without the three-step cumbersome procedure. 

Let us now interpret these regression coefficients: -0.0056 is the partial regression coefficient of PGNP and 
tells us that with the influence of FLR held constant, as PGNP increases, say, by a dollar, on average, child 
mortality goes down by 0.0056 units. To make it more economically interpretable, if the per capita GNP goes 
up by a thousand dollars, on average, the number of deaths of children under age 5 goes down by about 
5.6 per thousand live births. The coefficient -2.2316 tells us that holding the influence of PGNP constant, on 
average, the number of deaths of children under age 5 goes down by about 2.23 per thousand live births as 
the female literacy rate increases by one percentage point. The intercept value of about 263, mechanically 
interpreted, means that if the values of PGNP and FLR rate were fixed at zero, the mean child mortality rate 


*On this, see Section 7.8. 
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would be about 263 deaths per thousand live births. Of course, such an interpretation should be taken with 
a grain of salt. All one could infer is that if the two regressors were fixed at zero, child mortality will be quite 
high, which makes practical sense. The R? value of about 0.71 means that about 71 percent of the variation 
in child mortality is explained by PGNP and FLR, a fairly high value considering that the maximum value of R° 
can at most be 1. All told, the regression results make sense. 

What about the statistical significance of the estimated coefficients? We will take this topic up in Chapter 
8. As we will see there, in many ways this chapter will be an extension of Chapter 5, which dealt with the 
two-variable model. As we will also show, there are some important differences in statistical inference (i.e., 
hypothesis testing) between the two-variable and multivariable regression models. 


Regression on Standardized Variables 


In the preceding chapter we introduced the topic of regression on standardized variables and stated that the 
analysis can be extended to multivariable regressions. Recall that a variable is said to be standardized or in 
standard deviation units if it is expressed in terms of deviation from its mean and divided by its standard 
deviation. 

For our child mortality example, the results are as follows: 


CM“ = — 0.2026 PGNP* — 0.7639 FLR* 
se= (0.0713) (0.0713) r? = 0.7077 


(7.6.3) 


Note: The starred variables are standardized variables. Also note that there is no intercept in the model for 
reasons already discussed in the previous chapter. 

As you can see from this regression, with FLR held constant, a standard deviation increase in PGNP leads, 
on average, to a 0.2026 standard deviation decrease in CM. Similarly, holding PGNP constant, a standard 
deviation increase in FLR, on average, leads to a 0.7639 standard deviation decrease in CM. Relatively 
speaking, female literacy has more impact on child mortality than per capita GNP. Here you will see the 
advantage of using standardized variables, for standardization puts all variables on equal footing because all 
standardized variables have zero means and unit variances. 


Impact on the Dependent Variable of a Unit Change in More than One 
Regressor 


Before proceeding further, suppose we want to find out what would happen to the child mortality rate if we 
were to increase PGNP and FLR simultaneously. Suppose per capita GNP were to increase by a dollar and at 
the same time the female literacy rate were to go up by one percentage point. What would be the impact of 
this simultaneous change on the child mortality rate? To find out, all we have to do is multiply the coefficients 
of PGNP and FLR by the proposed changes and add the resulting terms. In our example this gives us: 


—0.0056(1) — 2.2316(1) = 2.2372 


That is, as a result of this simultaneous change in PGNP and FLR, the number of deaths of children under age 
5 would go down by about 2.24 deaths. 

More generally, if we want to find out the total impact on the dependent variable of a unit change in more 
than one regressor, all we have to do is multiply the coefficients of those regressors by the proposed changes 
and add up the products. Note that the intercept term does not enter into these calculations. (Why?) 


214 Basic Econometrics 


7.7 Simple Regression in the Context of Multiple Regression: 
Introduction to Specification Bias 


Recall that assumption (7.1.10) of the classical linear regression model states that the regression model used 
in the analysis is “correctly” specified; that is, there is no specification bias or specification error (see 
Chapter 3 for some introductory remarks). Although the topic of specification error will be discussed more 
fully in Chapter 13, the illustrative example given in the preceding section provides a splendid opportunity 
not only to drive home the importance of assumption (7.1.10) but also to shed additional light on the meaning 
of partial regression coefficient and provide a somewhat informal] introduction to the topic of specification 
bias. 

Assume that Eq. (7.6.1) is the “true” model explaining the behavior of child mortality in relation to per 
capita GNP and female literacy rate (FLR). But suppose we disregard FLR and estimate the following simple 
regression: 


Y; = ay + @2X2; + U1; (7.7.1) 


where Y = CM and X, = PGNP. 

Since Eq. (7.6.1) is the true model, estimating Eq. (7.7.1) would constitute a specification error; the 
error here consists in omitting the variable X}, the female literacy rate. Notice that we are using different 
parameter symbols (the alphas) in Eq. (7.7.1) to distinguish them from the true parameters (the betas) given in 
Eq. (7.6.1). 

Now will œ, provide an unbiased estimate of the true impact of PGNP, which is given by 8, in model 
(7.6.1)? Will E(d2) = B2, where âz is the estimated value of a,? In other words, will the coefficient of PGNP 
in Eq. (7.7.1) provide an unbiased estimate of the true impact of PGNP on CM, knowing that we have omitted 
the variable X, (FLR) from the model? As you would suspect, in general, @ will not be an unbiased estimator 
of the true 8,. To give a glimpse of the bias, let us run the regression (7.7.1), which gave the following results. 


CM, = 157.4244 — 0.0114 PGNP, 
se= (9.8455) (0.0032) 7? = 0.1662 


Observe several things about this regression compared to the “true” multiple regression (7.6.1): 


(7.7.2) 


1. In absolute terms (i.e., disregarding the sign), the PGNP eee has increased from 0.0056 to 
0.0114, almost a two-fold increase. 

2. The standard errors are different. 

3. The intercept values are different. 

4. Ther’ values are os different, although it is generally the case that, as the number of regressors 
in the model increases, the r* value increases. 


Now suppose that you regress child mortality on female literacy rate, disregarding the influence of PGNP. 
You will obtain the following results: 


CM; = 263.8635 — 2.3905 FLR; 


Tatas 
se = (21.2249) (0.2133) r? = 0.6696 


Again if you compare the results of this (misspecified) regression with the “true” multiple regression, 
you will see that the results are different, although the difference here is not as noticeable as in the case of 
regression (7.7.2). 
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The important point to note is that serious consequences can ensue if you misfit a model. We will look into 
this topic more thoroughly in Chapter 13, on specification errors. 


7.8 R? and the Adjusted R? 


An important property of R? is that it is a nondecreasing function of the number of explanatory variables or 
regressors present in the model, unless the added variable is perfectly collinear with the other regressors; as 
the number of regressors increases, R° almost invariably increases and never decreases. Stated differently, an 
additional X variable will not decrease R°. Compare, for instance, regression (7.7.2) or (7.7.3) with (7.6.2). 
To see this, recall the definition of the coefficient of determination: 


= ] —- — (7.8.1) 


Now J` y? is independent of the number of X variables in the model because it is simply $ (Y, - Y)*. The 
RSS, $` ú?, however, depends on the number of regressors present in the model. Intuitively, it is clear that 
as the number of X variables increases, ya? is likely to decrease (at least it will not increase); hence R? 
as defined in Eq. (7.8.1) will increase. In view of this, in comparing two regression models with the same 
dependent variable but differing number of X variables, one should be very wary of choosing the model with 
the highest R°. 

To compare two R? terms, one must take into account the number of X variables present in the model. This 
can be done readily if we consider an alternative coefficient of determination, which is as follows: 


_ La /a-h 
Ly /(n-1) 


where k = the number of parameters in the model including the intercept term. (In the three-variable 
regression, k = 3. Why?) The R? thus defined is known as the adjusted R?, denoted by R*. The term adjusted 
means adjusted for the df associated with the sums of squares entering into Eq. (7.8.1): > a? has n — k df 
in a model involving k parameters, which include the intercept term, and >. v? has n — 1 df. (Why?) For the 
three-variable case, we know that ) ĝ? has n — 3 df. 

Equation (7.8.2) can also be written as 


R =1 (7.8.2) 


a 
a = (7.8.3) 
Sy 


where 6? is the residual variance, an unbiased estimator of true o°, and S} is the sample variance of Y. 
It is easy to see that R? and R? are related because, substituting Eq. (7.8.1) into Eq. (7.8.2), we obtain 


R?=1-—(1- Ry (7.8.4) 


n= 
n — 
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It is immediately apparent from Eq. (7.8.4) that (1) for k > 1, R? < R? which implies that as the number 
of X variables increases, the adjusted R? increases less than the unadjusted R?; and (2) R? can be negative, 
although R? is necessarily nonnegative.'° In case R? turns out to be negative in an application, its value is 
taken as zero. 

Which R? should one use in practice? As Theil notes: 


... it is good practice to use R? rather than R? because R? tends to give an overly optimistic picture of the fit of the 
regression, particularly when the number of explanatory variables is not very small compared with the number of 
observations." 


But Theil’s view is not uniformly shared, for he has offered no general theoretical justification for the 
“superiority” of R*. For example, Goldberger argues that the following R’, call it modified R°, will do just 
as well:!” 


Modified R? = (1 — k/n) R? (7.8.5) 


His advice is to report R?, n, and k and let the reader decide how to adjust R? by allowing for n and k. 

Despite this advice, it is the adjusted R’, as given in Eq. (7.8.4), that is reported by most statistical packages 
along with the conventional R°. The reader is well advised to treat R? as just another summary statistic. 

Incidentally, for the child mortality regression (7.6.2), the reader should verify that R° is 0.6981, keeping 
in mind that in this example (n — 1) = 63 and (n — k) = 60. As expected, R? of 0.6981 is less than R? of 0.7077. 

Besides R° and adjusted R? as goodness of fit measures, other criteria are often used to judge the adequacy 
of a regression model. Two of these are Akaike’s Information criterion and Amemiya’s Prediction 
criteria, which are used to select between competing models. We will discuss these criteria when we consider 
the problem of model selection in greater detail in a later chapter (see Chapter 13). 


Comparing Two R? Values 


It is crucial to note that in comparing two models on the basis of the coefficient of determination, whether 
adjusted or not, the sample size n and the dependent variable must be the same; the explanatory variables 
may take any form. Thus for the models 


In Y; = By + BrXp; + P3X3i + ui (7.8.6) 
Y; = a + 7X; + 03.X3; aia u; (7.8.7) 


the computed R” terms cannot be compared. The reason is as follows: By definition, R* measures the proportion 
of the variation in the dependent variable accounted for by the explanatory variable(s). Therefore, in Eq. 
(7.8.6) R? measures the proportion of the variation in In Y explained by X, and X3, whereas in Eq. (7.8.7) it 
measures the proportion of the variation in Y, and the two are not the same thing: As noted in Chapter 6, a 
change in In Y gives a relative or proportional change in Y, whereas a change in Y gives an absolute change. 


‘Note, however, that if R? = 1, R2 = R? = 1. When R? = 0, R2 = (1 ~ k)(n- K), in which case R2 can be negative if k> 1. 
"Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, Nj, 1978, pass: 


arthur S. Goldberger, A Course in Econometrics, Harvard University Press, Cambridge, Mass., 1991, p. 178. For a more 
critical view of R*, see S. Cameron, “Why Is the R Squared Adjusted Reported?” Journal of Quantitative Economics, vol. 9, 
no. 1, January 1993, pp. 183-186. He argues that “It [R?] is NOT a test statistic and there seems to be no clear intuitive 
justification for its use as a descriptive statistic. Finally, we should be clear that it is not an effective tool for the prevention 
of data mining” (p. 186). 
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Therefore, var Y; /var Y; is not equal to var (In Y;)/var(In Y;); that is, the two coefficients of determination 
are not the same.!? 

How then does one compare the R°’s of two models when the regressand is not in the same form? To 
answer this question, let us first consider a numerical example. 


Example 7.2 Coffee Consumption in the United States, 1970-1980 


Consider the data in Table 7.1. The data pertain to consumption of cups of coffee per day (Y) and real 
retail price of coffee (X) in the United States for years 1970-1980. Applying OLS to the data, we obtain the 
following regression results: 


Y;= 2.6911 — 0.4795X; 


se = (0.1216) (0.1140) RSS = 0.1491; r? = 0.6628 


The results make economic sense: As the price of coffee increases, on average, coffee consumption goes 
down by about half a cup per day. The r? value of about 0.66 means that the price of coffee explains about 
66 percent of the variation in coffee consumption. The reader can readily verify that the slope coefficient is 
statistically significant. 


(7.8.8) 


Table 7.1 U.S. Coffee Consumption (Y) in Relation to Average Real Retail Price (X),* 1970-1980 


Y, 
Cups per Person X, 
Year per Day $ per Ib 
1970 2.57 0.77 
1971 2.50 0.74 
1972 235 072 
1975 2.30 0.73 
1974 2:25 0.76 
1975 2.20 5 0.75 
1976 211 1.08 
1977 1.94 1.81 
1978 197 i Tea? 
1979 2.06 1.20 


1980 2.02 17 


*Note: The nominal price was divided by the Consumer Price Index (CPI) for food and beverages, 1967 = 100. 
Source: The data for Y are from Summary of National Coffee Drinking Study, Data Group, Elkins Park, Penn., 1981; and the data on nominal X (i.e., X in 
current prices) are from Nielsen Food Index, A. C. Nielsen, New York, 1981. 

I am indebted to Scott E. Sandberg for collecting the data. 


13Erom the definition of R2, we know that 


T ae Ti 
ESS L(Y; — Y)? 
for the linear model and s 
TE R2 = 2 a 


~ Sin Y; — in Y)? 
for the log model. Since the denominators on the right-hand sides of these expressions are different, we cannot compare 


the two R? terms directly. 

As shown in Example 7.2, for the linear specification, the RSS = 0.1491 (the residual sum of squares of coffee consump- 
tion), and for the log-linear specification, the RSS = 0.0226 (the residual sum of squares of log of coffee consumption). 
These residuals are of different orders of magnitude and hence are not directly comparable. 
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From the same data, the following double-log, or constant elasticity, model can be estimated: 


nY; = 0.7774 — 0.2530 In X: 
se = (0.0152) (0.0494) RSS = 0.0226; r? = 0.7448 


Since this is a double-log model, the slope coefficient gives a direct estimate of the price elasticity coefficient. 
In the present instance, it tells us that if the price of coffee per pound goes up by 1 percent, on average, per 
day coffee consumption goes down by about 0.25 percent. Remember that in the linear model (7.8.8) the 
slope coefficient only gives the rate of change of coffee consumption with respect to price. (How will you 
estimate the price elasticity for the linear model?) The r? value of about 0.74 means that about 74 percent of 
the variation in the log of coffee demand is explained by the variation in the log of coffee price. 

Since the r? value of the linear model of 0.6628 is smaller than the r° value of 0.7448 of the log-linear 
model, you might be tempted to choose the latter model because of its high r? value. But for reasons already 
noted, we cannot do so. But if you do want to compare the two r° values, you may proceed as follows: 

1. Obtain ny; from Eq. (7.8.9) for each observation; that is, obtain the estimated log value of each obser- 
vation from this model. Take the antilog of these values and then compute r?° between these antilog 
values and actual Y, in the manner indicated by Eq. (3.5.14). This r? value is comparable to the r? value 
of the linear model (7.8.8). 

2. Alternatively, assuming all Y values are positive, take logarithms of the Y values, In Y. Obtain the estimated 
Y values, Y;, from the linear model (7.8.8), take the logarithms of these estimated Y values (i.e., In Y;), 
and compute the r? between (In Y,) and (In Y;) in the manner indicated in Eq. (3.5.14). This r° value is 
comparable to the r? value obtained from Eq. (7.8.9). 

For our coffee example, we present the necessary raw data to compute the comparable r?’s in Table 7.2. 
To compare the r? value of the linear model (7.8.8) with that of (7.8.9), we first obtain log of (Y;) (given in 
column [6] of Table 7.2), then we obtain the log of actual Y values (given in column [5] of the table), and then 
compute r° between these two sets of values using Eq. (3.5.14). The result is an r2 value of 0.6779, which 
is now comparable with the r? value of the log-linear model of 0. 7448. The difference between the two r? 
values is about 0.07. 


(7.8.9) 


B Antilog of 
Ye y; In Y: iny; In Y: In EA 

Year (1) (2) (3) (4) (5) (6) 
1970 2.57 2.321887 0.843555 2.324616 0.943906 0.842380 
1971 2.50 2.336272 70858611 2.348111 0.916291 0.848557 
1972 2.35 2.345863 0.860544 2.364447 0.854415 0.852653 
1973 2.30 2.341068 0.857054 2.356209 0.832909 ` 0.850607 
1974 2.25 2.326682 0.846863 2.332318 0.810930 0.844443 
1975 2.20 2.331477 0.850214 2.340149 0.788457 0.846502 
1976 2.11 2.173233 0.757943 2.133882 0.746688 0.776216 
1977 1.94 1.823176 0.627279 1.872508. 0.662688 0.600580 
1978 1.97 2.024579 0.694089 2.001884 0.678034 0.705362 
1979 2.06 2.115689 0.731282 2.077742 0.722706 0.749381 
1980 2.02 2.130075 0.737688 2.091096 0.703098 0.756157 


Notes; Column (1): Actual Y values from Table 7.1. 


Column (2): Estimated Y values from the linear model (7.8.8). 


Column (3): Estimated log Y values from the double-log model (7.8.9). 


Column (4): Antilog of values in column (3). 
Column (5): Log values of Y in column (1). 
Column (6): Log values of Y, in column (2), 
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On the other hand, if we want to compare the rê value of the log-linear model with the linear model, we 
obtain InY; for each observation from Eq. (7.8.9) (given in column [3] of the table), obtain their antilog values 
(given in column [4] of the table), and finally compute r? between these antilog values and the actual Y values, 
using formula (3.5.14). This will give an r° value of 0.7187, which is slightly higher than that obtained from 
the linear model (7.8.8), namely, 0.6628. 

Using either method, it seems that the log-linear model gives a slightly better fit. 


Allocating R? among Regressors 


Let us return to our child mortality example. We saw in Eq. (7.6.2) that the two regressors PGNP and FLR 
explain 0.7077 or 70.77 percent of the variation in child mortality. But now consider the regression (7.7.2) 
where we dropped the FLR variable and as a result the r7 value dropped to 0.1662. Does that mean the 
difference in the r* value of 0.5415 (0.7077 — 0.1662) is attributable to the dropped variable FLR? On the 
other hand, if you consider regression (7.7.3), where we dropped the PGNP variable, the r? value drops 
to 0.6696. Does that mean the difference in the r? value of 0.0381 (0.7077 — 0.6696) is due to the omitted 
variable PGNP? 

The question then is: Can we allocate the multiple R? of 0.7077 between the two regressors, PGNP and 
FLR, in this manner? Unfortunately, we cannot do so, for the allocation depends on the order in which the 
regressors are introduced, as we just illustrated. Part of the problem here is that the two regressors are corre- 
lated, the correlation coefficient between the two being 0.2685 (verify it from the data given in Table 6.4). 
In most applied work with several regressors, correlation among them is a common problem. Of course, the 
problem will be very serious if there is perfect collinearity among the regressors. 

The best practical advice is that there is little point in trying to allocate the R? value to its constituent 
regressors. 


The “Game” of Maximizing R? 


In concluding this section, a warning is in order: Sometimes researchers play the game of maximizing R?, 
that is, choosing the model that gives the highest R?. But this may be dangerous, for in regression analysis 
our objective is not to obtain a high R? per se but rather to obtain dependable estimates of the true population 
regression coefficients and draw statistical inferences about them. In empirical analysis it is not unusual to 
obtain a very high R? but find that some of the regression coefficients either are statistically insignificant or 
have signs that are contrary to a priori expectations. Therefore, the researcher should be more concerned 
about the logical or theoretical relevance of the explanatory variables to the dependent variable and their 
statistical significance. If in this process we obtain a high R?, well and good; on the other hand, if R? is low, 
it does not mean the model is necessarily bad.!* 
As a matter of fact, Goldberger is very critical about the role of R?. He has said: 


From our perspective, R? has a very modest role in regression analysis, being a measure of the goodness of fit of a 
sample LS [least-squares] linear regression in a body of data. Nothing in the CR [CLRM] model requires that R? 
be high. Hence a high R? is not evidence in favor of the model and a low R? is not evidence against it. 


14Some authors would like to deemphasize the use of R? as a measure of goodness of fit as well as its use for comparing two 
or more R? values. See Christopher H. Achen, Interpreting and Using Regression, Sage Publications, Beverly Hills, Calif., 1982, 
pp. 58-67, and C. Granger and P. Newbold, “R? and the Transformation of Regression Variables,” Journal of Econometrics, 
vol. 4, 1976, pp. 205-210. Incidentally, the practice of choosing a model on the basis of highest R?, a kind of data mining, 
introduces what is known as pretest bias, which might destroy some of the properties of OLS estimators of the classical 
linear regression model. On this topic, the reader may want to consult George G. Judge, Carter R. Hill, William E. Griffiths, 
Helmut Liitkepohl, and Tsoung-Chao Lee, Introduction to the Theory and Practice of Econometrics, john Wiley, New York, 
1982, Chapter 21. 
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In fact the most important thing about R? is that it is not important in the CR model. The CR model is concerned 
with parameters in a population, not with goodness of fit in the sample. . . . If one insists on a measure of predictive 
success (or rather failure), then o° might suffice: after all, the parameter a” is the expected squared forecast error 
that would result if the population CEF [PRF] were used as the predictor. Alternatively, the squared standard error 
of forecast.. . at relevant values of x [regressors] may be informative.'> 


7.9 The Cobb-Douglas Production Function: More on Functional Form 


In Section 6.4 we showed how with appropriate transformations we can convert nonlinear relationships into 
linear ones so that we can work within the framework of the classical linear regression model. The various 
transformations discussed there in the context of the two-variable case can be easily extended to multiple 
regression models. We demonstrate transformations in this section by taking up the multivariable extension 
of the two-variable log-linear model; others can be found in the exercises and in the illustrative examples 
discussed throughout the rest of this book. The specific example we discuss is the celebrated Cobb—Douglas 
production function of production theory. 
The Cobb-Douglas production function, in its stochastic form, may be expressed as 


Yap xe (7.9.1) 
where Y = output 
X, = labor input 
X, = capital input 
u = stochastic disturbance term 
e = base of natural logarithm 


From Eq. (7.9.1) it is clear that the relationship between output and the two inputs is nonlinear. However, 
if we log-transform this model, we obtain: s 


In Y; = In £1 + Bz In. Xp; + B3 ln Xz; + ui 


7.9. 
= Bo + Bo In Xz; + B3 In X3; + u; Cm 


Thus written, the model is linear in the parameters By, B>, and B, and is therefore a linear regression 
model. Notice, though, it is nonlinear in the variables Y and X but linear in the logs of these variables. In 
short, Eq. (7.9.2) is a log-log, double-log, or log-linear model, the multiple regression counterpart of the 
two-variable log—linear model (6.5.3). 

The properties of the Cobb-Douglas production function are quite well known: 

1. B, is the (partial) elasticity of output with respect to the labor input, that is, it measures the percentage 
change in output for, say, a 1 percent change in the labor input, holding the capital input constant (see 
Exercise 7.9). 

2. Likewise, 6; is the (partial) elasticity of output with respect to the capital input, holding the labor input 
constant. 

3. The sum (8, + 83) gives information about the returns to scale, that is, the response of output to a 
proportionate change in the inputs. If this sum is 1, then there are constant returns to scale, that is, doubling 
the inputs will double the output, tripling the inputs will triple the output, and so on. If the sum is less than 1, 
there are decreasing returns to scale—doubling the inputs will less than double the output. Finally, if the sum 
is greater than 1, there are increasing returns to scale—doubling the inputs will more than double the output. 


'Sarther S. Goldberger, op. cit., pp. 177-178. 
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Before proceeding further, note that whenever you have a log-linear regression model involving any 
number of variables the coefficient of each of the X variables measures the (partial) elasticity of the dependent 
variable Y with respect to that variable. Thus, if you have a k-variable log-linear model: 


In Y; = Bo + B2 In Xz; + b3 1n Xz; +--+ + By ln Xy; + ui (7.9.3) 


each of the (partial) regression coefficients, 8, through B,, is the (partial) elasticity of Y with respect to 
variables X, through X,.'° 


Example 7.3 Value Added, Labor Hours, and Capital Input in the Manufacturing Sector 


To illustrate the Cobb-Douglas production function, we obtained the data shown in Table 7.3; these data are 
for the manufacturing sector of all 50 states and Washington, DC, for 2005. 

Assuming that the model (7.9.2) satisfies the assumptions of the classical linear regression model,'” we 
obtained the following regression by the OLS method (see Appendix 7A, Section 7A.5 for the computer 
printout): 


Table 7.3 Value Added, Labor Hours, and Capital Input in the Manufacturing Sector of the U.S., 2005 


Capital Input 
Output Labor Input Capital 
Value Added Worker Hrs Expenditure 
(thousands of $) (thousands) (thousands of $) 
Area Y X2 X3 
Alabama 38,372,840 424,471 2,689,076 
Alaska 1,805,427 19,895 57,997 
Arizona 23,736,129 206,893 2,308,272 
Arkansas 26,981,983 304,055 1,376,235 
California 217,546,032 1,809,756 13,554,116 
Colorado 19,462,751 180,366 1,790,751 
Connecticut 28,972,772 224,267 1,210,229 
Delaware 14,313,157 54,455 421,064 
District of Columbia 159,921 2,029 7,188 
Florida l 47,289,846 471,211 2,761,281 
Georgia 63,015,125 659,379 3,540,475 
Hawaii 1,809,052 17,528 146,371 
Idaho 10,511,786 75,414 848,220 
Illinois 105,324,866 963,156 5,870,409 
Indiana 90,120,459 835,083 5,832,503 
lowa 39,079,550 336,159 1,795,976 
Kansas 22,826,760 246,144 1,595,118 
Kentucky 38,686,340 384,484 2,503,693 
Louisiana 69,910,555 216,149 4,726,625 
a O a E 
(Contd) 


16To see this, differentiate Eq. (7.9.3) partially with respect to the log of each X variable. Therefore, @ In Y/a In Xz = (8Y/3X3) 
(X,/Y) = Bz which, by definition, is the elasticity of Y with respect to X, and a In Y/a In X3 = (A¥/8X3)(X3/Y) = B3, which is the 
elasticity of Y with respect to X3, and so on. 

'7Notice that in the Cobb-Douglas production function (7.9.1) we have introduced the stochastic error term in a special 
way so that in the resulting logarithmic transformation it enters in the usual linear form. On this, see Section 6.9. 
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(Contd) 
Maine P 7,856,947 82,021 415,131 
Maryland 21,352,966 174,855 1,729,116 
Massachusetts 46,044,292 355,701 2,706,065 
Michigan 92,335,528 943,298 5,294,356 
Minnesota 48,304,274 456,553 2,833,525 
Mississippi 17,207,903 267,806 1,212,289) 
Missouri 47,340,157 439,427 2,404,122 
Montana 2,644,567 24,167 334,008 
Nebraska 14,650,080 163,637 627,806 
Nevada 7,290,360 59,737 5227555 
New Hampshire 9,188,322 96,106 507,488 
New Jersey 51,298,516 407,076 3,295,056 
New Mexico 20,401,410 43,079 404,749 
New York 87,756,129 727,177 4,260,353 
North Carolina 101,268,432 820,013 4,086,558 
North Dakota 3,556,025 34,723 184,700 
Ohio 124,986,166 1,174,540 6,301,421 
Oklahoma 20,451,196 201,284 1327359 
Oregon 34,808,109 257,820 1,456,683 
Pennsylvania 104,858,322 944,998 5,896,392 
Rhode Island 6,541,356 68,987 297,618 
South Carolina 37,668,126 400,317 2,500,071 
South Dakota 4,988,905 56,524 311,251 
Tennessee 62,828,100 582,241 -4,126,465 
Texas 172,960,157 1,120,382 11,588,283 
Utah 15,702,637 150,030 762,671 
Vermont 5,418,786 48,134 276,293 
Virginia 49,166,991 425,346 2,731,669 
Washington 46,164,427 313,279 1,945,860 
West Virginia 9,185,967 89,639 685,587 
Wisconsin 66,964,978 694,628 3,902,823 
Wyoming 2,979,475 15,221 361,536 
Source: 2005 Annual Survey of Manufacturers, Sector 31: Supplemental Statistics for U.S. 
InYi= 3.8876 + 0.4683InX2; + 0.52131nX3; 
(0.3962) (0.0989) (0.0969) 
t=(9.8115) (4.7342) (5.3803) 
R? = 0.9642 
R? = 0.9627 


From Eq. (7.9.4) we see that in the U.S. manufacturing sector for 2005, the output elasticities of labor 
and capital were 0.4683 and 0.521 3, respectively. In other words, over the 50 U.S. states and the District of 
Columbia, holding the capital input constant, a 1 percent increase in the labor input led on the average to 
about a 0.47 percent increase in the output. Similarly, holding the labor input constant, a 1 percent increase 
in the capital input led on the average to about a 0.52 percent increase in the output. Adding the two output 
elasticities, we obtain 0.99, which gives the value of the returns to scale parameter. As is evident, the manufac- 
turing sector for the 50 United States and the District of Columbia was characterized by constant returns to 


scale. 


_ df = 48 
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From a purely statistics viewpoint, the estimated regression line fits the data quite well. The R? value of 
0.9642 means that about 96 percent of the variation in the (log of) output is explained by the (logs of) labor 
and capital. In Chapter 8, we shall see how the estimated standard errors can be used to test hypotheses about 
the “true” values of the parameters of the Cobb-Douglas production function for the U.S. manufacturing 
sector of the economy. 
eee 


7.10 Polynomial Regression Models 


We now consider a class of multiple regression models, the 
polynomial regression models, that have found extensive 
use in econometric research relating to cost and production 
functions. In introducing these models, we further extend 
the range of models to which the classical linear regression 
model can easily be applied. 

To fix the ideas, consider Figure 7.1, which relates 
the short-run marginal cost (MC) of production (Y) of a 
commodity to the level of its output (X) The visually-drawn 
MC curve in the figure, the textbook U-shaped curve, shows 
that the relationship between MC and output is nonlinear. If 
we were to quantify this relationship from the given scatter- 
points, how would we go about it? In other words, what type 
of econometric model would capture first the declining and 
then the increasing nature of marginal cost? 

Geometrically. the MC curve depicted in Figure 7.1 represents a parabola. Mathematically, the parabola 
is represented by the following equation: 


Marginal cost 


Xx 


Output 


Figure 7.1 The U-shaped marginal cost curve. 


Y = Bo + BiX + pX” (7.10.1) 
which is called a quadratic function, or more generally, a second-degree polynomial in the variable X—the 
highest power of X represents the degree of the polynomial (if X? were added to the preceding function, it 
would be a third-degree polynomial, and so on). 

The stochastic version of Eq. (7.10.1) may be written as 


Y; = Bo + PiX: + PoX; + úi (7.10.2) 
which is called a second-degree polynomial regression. 
The general kth degree polynomial regression may be written as 
Y; = Bo + BrX; + P2X? + +++ + BrXF + ui (7.10.3) 
Notice that in these types of polynomial regressions there is only one explanatory variable on the right-hand 
side but it appears with various powers, thus making them multiple regression models. Incidentally, note that 
if X,is assumed to be fixed or nonstochastic, the powered terms of X; also become fixed or nonstochastic. 
Do these models present any special estimation problems? Since the second-degree polynomial (7.10.2) 
or the kth degree polynomial (7.10.13) is linear in the parameters, the B's, they can be estimated by the usual 
OLS or ML methodology. But what about the collinearity problem? Aren’t the various X’s highly correlated 
since they are all powers of X? Yes, but remember that terms like X°, X°, X4, etc., are all nonlinear functions 
of X and hence, strictly speaking, do not violate the no multicollinearity assumption. In short, polynomial 
regression models can be estimated by the techniques presented in this chapter and present no new estimation 
problems. 
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Example 7.4 Estimating the Total Cost Function 


As an example of the polynomial regression, consider the data on output and total cost of production of a 
commodity in the short run given in Table 7.4. What type of regression model will fit these data? For this 
purpose, let us first draw the scattergram, which is shown in Figure 7.2. 

From this figure it is clear that the relationship between total cost and output resembles the elongated S 
curve; notice how the total cost curve first increases gradually and then rapidly, as predicted by the celebrated 
law of diminishing returns. This S shape of the total cost curve can be captured by the following cubic or third- 
degree polynomial: 

` Yi = Bo + B1 Xi + 2X7 + B3X? + uy; (7.10.4) 
where Y = total cost and X = output. | 

Given the data of Table 7.4, we can apply the OLS method to estimate the parameters of Eq. (7.10.4). But 
before we do that, let us find out what economic theory has to say about the short-run cubic cost function 
(7.10.4). Elementary price theory shows that in the short run the marginal cost (MC) and average cost (AC) 
curves of production are typically U-shaped—initially, as output increases both MC and AC decline, but after 
a certain level of output they both turn upward, again the consequence of the law of diminishing return. This 
can be seen in Figure 7.3 (see also Figure 7.1). And since the MC and AC curves are derived from the total cost 
curve, the U-shaped nature of these curves puts some restrictions on the parameters of the total cost curve 
(7.10.4). As a matter of fact, it can be shown that the parameters of Eq. (7.10.4) must satisfy the following 
restrictions if one is to observe the typical U-shaped short-run marginal and average cost curves:'® 


1. Bo, 61, and 63 > 0 
2. 2 = 0 l (7.10.5) 
3. BF < A 


All this theoretical discussion might seem a bit tedious. But this knowledge is extremely useful when we 
examine the empirical results, for if the empirical results do not agree with prior expectations, then, assuming 
we have not committed a specification error (i.e., chosen the wrong model), we will have to modify our 
theory or look for a new theory and start the empirical enquiry all over again. But as noted in the Introduction, 
this is the nature of any empirical investigation. 
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Figure 7.2 The total cost curve. 


18See Alpha C. Chiang, Fundamental Methods of Mathematical Economics, 3d ed., McGraw-Hill, New York, 1984, pp. 250-252. 
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Figure 7.3 Short-run cost functions. 


Empirical Results. When the third-degree polynomial regression was fitted to the data of Table 7.4, we 
obtained the following results: 
¥; = 141.7667 + 63.4776X; — 12.9615X? + 0.9396X? 
(6.3753) (4.7786) (0.9857) (0.0591) R? = 0.9983 (7.10.6) 
(Note: The figures in parentheses are the estimated standard errors.) Although we will examine the statistical 
significance of these results in the next chapter, the reader can verify that they are in conformity with the 


theoretical expectations listed in Eq. (7.10.5). We leave it as an exercise for the reader to interpret the 
regression (7.10.6). 


Example 7.5 GDP Growth Rate and Relative per Capita GDP for 2007 in 190 Countries (in billions 
of 2000 dollars) 


As an additional economic example of the polynomial regression model, consider the following regression 
results: 


GDPG;= 5.5347 — 5.5788 RGDP + 2.8378 RGDP? 
se = (0.2435) (1.5995) (1.4391) (7.10.7) 
R? = 0.1092 adj R? = 0.0996 
Where GDPG _ GDP growth rate, percent in 2007, and RGDP = relative per capita GDP in 2007 (percentage 


of U.S. GDP per capita, 2007). The adjusted R? (adj Rô) tells us that after taking into account the number of 
regressors, the model explains only about 9.96 percent of the variation in GDPG. Even the unadjusted R? of 
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0.1092 seems low. This might seem to be a disappointing value, but as we shall show in the next chapter, 
such low R? values are frequently encountered in cross-sectional data with a large number of observations. 
Besides, even an apparently low R? value can be statistically significant (i.e., different from zero), as we will 
show in the next chapter. 


Source: World Bank World Development Indicators, adjusted to 2000 base and estimated and projected values developed by the Economic Research Service. 


"7.11 Partial Correlation Coefficients 
Explanation of Simple and Partial Correlation Coefficients 


In Chapter 3 we introduced the coefficient of correlation r as a measure of the degree of linear association 
between two variables. For the three-variable regression model we can compute three correlation coefficients: 
rı, (correlation between Y and X,), r; 3 (correlation coefficient between Y and X3), and r, ; (correlation coeffi- 
cient between X, and X3); notice that we are letting the subscript 1 represent Y for notational convenience. 
These correlation coefficients are called gross or simple correlation coefficients, or correlation coeffi- 
cients of zero order. These coefficients can be computed by the definition of correlation coefficient given in 
Eq, G@:5.13): 

But now consider this question: Does, say, r} in fact measure the “true” degree of (linear) association 
between Y and X, when a third variable X} may be associated with both of them? This question is analogous 
to the following question: Suppose the true regression model is (7.1.1) but we omit from the model the 
variable X, and simply regress Y on X>, obtaining the slope coefficient of, say, b». Will this coefficient be 
equal to the true coefficient B, if the model (7.1.1) were estimated to begin with? The answer should be 
apparent from our discussion in Section 7.7. In general, r}, is not likely to reflect the true degree of associ- 
ation between Y and X, in the presence of X}. As a matter of fact, it is likely to give a false impression of the 
nature of association between Y and X,, as will be shown shortly. Therefore, what we need is a correlation 
coefficient that is independent of the influence, if any, of X, on X, and Y. Such a correlation coefficient can be 
obtained and is known appropriately as the partial correlation coefficient. Conceptually, it is similar to the 
partial regression coefficient. We define — 


r12.3 = partial correlation coefficient between Y and X,, holding X, constant 
r\3.2 = partial correlation coefficient between Y and X;3, holding X, constant 
r23 ı = partial correlation coefficient between X, and X;, holding Y constant 


These partial correlations can be easily obtained from the simple or zero-order, correlation coefficients as 
follows (for proofs, see the exercises): !° 


Fiz —¥13923 


r23 =- a VATA 1) 
(1 ie (1 133) 
Ao = a ls) l (7 gi 2) 
(1 r) (1 T r33) 
F33 — 1253 
r231 = -m l (7.11.3) 


(m= ri) (1 =i 


*Optional. 


Most computer programs for multiple regression analysis routinely compute the simple correlation coefficients; hence the 
partial correlation coefficients can be readily computed. 
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The partial correlations given in Eqs. (7.11.1) to (7.11.3) are called first-order correlation coefficients. 
By order we mean the number of secondary subscripts. Thus r} >34 would be the correlation coefficient of 
order two, r} > 345 would be the correlation coefficient of order three, and so on. As noted previously, r, 5, 7; 3; 
and so on are called simple or zero-order correlations. The interpretation of, say, r| > 34 is that it gives the 
coefficient of correlation between Y and X,, holding X, and X, constant. 


Interpretation of Simple and Partial Correlation Coefficients 


In the two-variable case, the simple r had a straightforward meaning: It measured the degree of (linear) 
association (and not causation) between the dependent variable Y and the single explanatory variable X. But 
once we go beyond the two-variable case, we need to pay careful attention to the interpretation of the simple 
correlation coefficient. From Eq. (7.11.1), for example, we observe the following: 

1. Even if r13 = 0, r,>3 will not be zero unless r}, or r3, or both are zero. 

2. If r= 0 and r,, and ,,, are nonzero and are of the same sign, r, 7 will be negative, whereas if they 
are of the opposite signs, it will be positive. An example will make this point clear. Let Y = crop yield, X, 
= rainfall, and X, = temperature. Assume r, , = 0, that is, no association between crop yield and rainfall. 
Assume further that r, ; is positive and r, ; is negative. Then, as Eq. (7.11.1) shows, r} 5 3 will be positive; that 
is, holding temperature constant, there is a positive association between yield and rainfall. This seemingly 
paradoxical result, however, is not surprising. Since temperature X, affects both yield Y and rainfall X,, in 
order to find out the net relationship between crop yield and rainfall, we need to remove the influence of the 
“nuisance” variable temperature. This example shows how one might be misled by the simple coefficient of 
correlation. 

3. The terms r, 3 and r, 5 (and similar comparisons) need not have the same sign. 

4. In the two-variable case we have seen that 7° lies between O and 1. The same property holds true of 
the squared partial correlation coefficients. Using this fact, the reader should verify that one can obtain the 
following expression from Eq. (7.11.1): 


0 <r? tri +133 — 2rarisr23 <1 (7.11.4) 


which gives the interrelationships among the three zero-order correlation coefficients. Similar expressions 
can be derived from Eqs. (7.11.2) and (7.11.3). 

5. Suppose that r}; = r23 = 0. Does this mean that r} , is also zero? The answer is obvious from Eq. (7.11.4). 
The fact that Y and X, and X, and X, are uncorrelated does not mean that Y and X, are uncorrelated. 

In passing, note that the expression r?, , may be called the coefficient of partial determination and may 
be interpreted as the proportion of the variation in Y not explained by the variable X, that has been explained 
by the inclusion of X, into the model (see Exercise 7.5). Conceptually it is similar to RS 

Before moving on, note the following relationships between R7, simple correlation coefficients, and partial 
correlation coefficients: 


p? a Tin this rrarisras sais 
ba r33 

R? =riat(l-ri) ris (7.11.6) 

R= ri3 Ar (1 = ris) ri23 (7.11.7) 


In concluding this section, consider the following: It was stated previously that R° will not decrease if an 
additional explanatory variable is introduced into the model, which can be seen clearly from Eq. (7.11.6). 


228 Basic Econometrics 


This equation states that the proportion of the variation in Y explained by X, and X; jointly is the sum of 
two parts: the part explained by X, alone (= r?,) and the part not explained by X2 (i= res) times the 
proportion that is explained by X; after holding the influence of X, constant. Now R? > r?, so long as 


r2,. > 0. At worst, 7? will be zero, in which case R? = r7. 


Summary and Conclusions 


1. This chapter introduced the simplest possible multiple linear regression model, namely, the three- 
variable regression model. It is understood that the term linear refers to linearity in the parameters and 
not necessarily in the variables. 

2. Although a three-variable regression model is in many ways an extension of the two-variable model, 
there are some new concepts involved, such as partial regression coefficients, partial correlation 
coefficients, multiple correlation coefficient, adjusted and unadjusted (for degrees of freedom) R 
multicollinearity, and specification bias. 

3. This chapter also considered the functional form of the multiple regression model, such as the Cobb- 
Douglas production function and the polynomial regression model. 

4. Although R? and adjusted R? are overall measures of how the chosen model fits a given set of data, their 
importance should not be overplayed. What is critical is the underlying theoretical expectations about 
the model in terms of a priori signs of the coefficients of the variables entering the model and, as it is 
shown in the following chapter, their statistical significance. 

5. The results presented in this chapter can be easily generalized to a multiple linear regression model 
involving any number of regressors. But the algebra becomes very tedious. This tedium can be avoided 
by resorting to matrix algebra. For the interested reader, the extension to the k-variable regression 
model using matrix algebra is presented in Appendix C, which is optional. But the general reader can 
read the remainder of the text without knowing much of matrix algebra. 


Multiple Choice Questions 


1. The simplest possible multiple regression model is a 
a. One variable model 
b. Two variable model 
c. Three variable model 
d. Multi-variable model 
2. Multiple linear regression models 
a. are linear in parameter and linear in variables 
b. are linear in parameter and may not be linear in variables 
c. may not be linear in parameter but are linear in variables 
d. may not be linear in parameter and variables 
3. Y; = PiX; + BX; + B3X3; + u; where X,; = 1 for all i. This is an example of 
a. Three variable model 
b. X variable model 
c. Four variable model 
d. Three beta model 


10. 


Tae 
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In Y, = B, + B.X>; + BX; + u; the partial regression coefficients are given by 
a. B, and B, 
b. B, and B; 
c. B, and $; 
d. B, and u; 
In classical linear regression model, Var (u;) = o” refers to the assumption of 
a. Zero mean value of disturbance term 
b. Homoscedasticity 
c. No autocorrelation 
d. No multicollinearity 
In classical linear regression model, AX; + A3X3; = 0 with A, = A; = 0 refers to the assumption of 
a. Zero mean value of disturbance term 
b. Homoscedasticity 
c. No autocorrelation 
d. No multicollinearity 
In classical linear regression model, Cov ( u,, uj) =0, i + j refers to the assumption of 
a. Zero mean value of disturbance term 
b. Homoscedasticity 
c. No autocorrelation 
d. No multicollinearity 
The assumption of multicollinearity means that 
a. There should be no correlation among the regressors. 
b. There should be no linear relationship among the regressors 
c. There should be no nonlinear relationship among the regressors 
d. There should be no relationship among the regressors 
Given Y, = B, + B,X>; + B3X3; + u;, state which of the following statement is true 
a. B, measures the change in the mean value of Y per unit change in X,, holding the value of X; 
constant 
b B, gives the net effect of a unit change in X}, on the mean value of Y, net of any effect that X, may 
have on mean Y 
c. Botha and b are true 
d. Neither a nor b is true 
The measure of proportion or percentage of variation in Y explained by the explanatory variables (X,, 
X3, ...) jointly is given by 
a. 
DR 
CER 
d. R 
Multiple coefficient of determination measures the 
a. Goodness of fit of multiple regression model 
b. Homoscedasticity of multiple regression model 
c. Heteroscedasticity of multiple regression model 
d. Multicollinerity of multiple regression model 
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Let the regression results for the impact of per capita GNP (PGNP) and female literacy rate (FLR) on 
child mortality (CM) be as given below: 
CM* = 0.40 PGNP” — 0.04 FLRÝ 
where starred variables indicate standardized variables. Can we say that 
a. PGNP has lower impact on CM as compared to FLR 
b. FLR has lower impact on CM as compared to PGNP 
c. Cannot compare the coefficients directly 
d. Impact depends on ¢ value 
As the number of explanatory variables increase in a regression model, the R? value 
a. Definitely decreases 
b. Definitely increases 
c. Definitely will not decrease 
d. Definitely will not increase 
When R? = 1; R? would be equal to 
a. 0 
b. +1 
c. -l 
d. Less than 1 
R° can take values 
a. Between 0 and 1 
b. Between —1 and 1 
c. Between —1 and 0 
d. Less than +1 
The value of R? is always less than R?. This statement is 
a. Incorrect 
b. Correct 
c. Depends on k value 
d. Depends on n value 
In comparing two models on the basis of goodness of fit 
a. The sample size must be the same 
b. The dependent variable must be the same ~v 
c. The independent variables must be the same 
d. Both a and b above 
The Cobb-Douglas production function is represented by 


a. Y= BX xÉ e" 

b. a mae" 

c. ¥,=B,XB xP +e" 

d. Y= pa we 

Quadratic function is represented by 

a. ¥,= Byt+ BX; +u,; 

b YERA + BX? +, 

c. ¥;= By + BX; + B,X; + BX; +u; 
Y, = By + BX; + u; 
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20. If correlation between Y and X, is zero (r,, = 0), then the partial correlation coefficient between Y and 
X,, holding X, constant (7,7 3) would be 
a. Positive value 
b. Negative value 
c. Zero 
d. Any of the above 


Exercises 
Questions 
7.1. Consider the data in Table 7.5. 
Table 7.5 

Y X2 X3 

1 1 2 

3 2 1 

8 3 =} 


Based on these data, estimate the following regressions: 


Y; = æi HAX bu (1) 
¥; = Ay +A3X3; + t2; (2) 
Y; = Bi + BoX2; + B3X3; + ui (3) 


Note: Estimate only the coefficients and not the standard errors. 
a. Is a, = B,? Why or why not? 
b. Is A3 = B3? Why or why not? 
What important conclusion do you draw from this exercise? 
7.2. From the following data estimate the partial regression coefficients, their standard errors, and the 
adjusted and unadjusted R? values: l 
Ý = 367.693 X, = 402.760 X; =8.0 
X (Y; — F} = 66042.269 X (Xa — X2)” = 84855.096 
Y(X — X3)? = 280.000 YOO; — Y)(Xai — X2) = 74778.346 


5O; — Y)(X3i — X3) = 4250.900 X (Xa — X2)(X3i — X3) = 4796.000 
HSS 
7.3. Show that Eq. (7.4.7) can also be expressed as 
A Dea — pre) 
D (x2; — 23x31)? 
_ net (of x3) covariation between y and x2 


net (of x3) variation in x2 


where b,, is the slope coefficient in the regression of X, on X3. (Hint: Recall that b23 = Ù x2:x3;/ X x3.) 
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7.4. In a multiple regression model you are told that the error term u; has the following probability distri- 
bution, namely, u; ~ N(0, 4). How would you set up a Monte Carlo experiment to verify that the true 
variance is in fact 4? 

7.5. Show that r?,, = (R? —r?,)/(1 — rj) and interpret the equation. 

7.6. If the relation a,X, + aX, + aX; = 0 holds true for all values of X,, Xj, and X3, find the values of the 
three partial correlation coefficients. 

7.7. Is it possible to obtain the following from a set of data? 

a. %3=0.9, r13= 0.2, 71. =0.8 
b. 147 =0.6, r,3=—0.9, r3; =-0.5 
c. ra; = 0.01, 7,3 = 0.66, r33 =-0.7 

7.8. Consider the following model: 

Y, = B, + B, Education, + B, Years of experience + u; 


Suppose you leave out the years of experience variable. What kinds of problems or biases would you 
expect? Explain verbally. 

7.9. Show that B, and B; in Eq. (7.9.2) do, in fact, give output elasticities of labor and capital. (This question 
can be answered without using calculus; just recall the definition of the elasticity coefficient and 
remember that a change in the logarithm of a variable is a relative change, assuming the changes are 
rather small.) 

7.10. Consider the three-variable linear regression model discussed in this chapter. 
a. Suppose you multiply all the X, values by 2. What will be the effect of this rescaling, if any, on the 
estimates of the parameters and their standard errors? 
b. Now instead of (a), suppose you multiply all the Y values by 2. What will be the effect of this, if 
any, on the estimated parameters and their standard errors? 


7.11. In general R? £ r? + Pior but it is so only if r, = 0. Comment and point out the significance of this 
finding. (Hint: See Eq. [7.11.5].) 
7.12. Consider the following models.” 
Model A: Y, = a +02Xy, +03.X3, + uy, 
Model B: (Y, — Xn) = By + BoX24 + P3 X3: + ur 
Will OLS estimates of a, and B, be the same? Why? M 
Will OLS estimates of œ, and B, be the same? Why? 
. What is the relationship between a, and B,? 


d. Can you compare the R? terms of the two models? Why or why not? 
7.13. Suppose you estimate the consumption function’ 


Y; = ay +a2X; + ii 


a SR 


and the savings function 
Zi = By + BoXj + ut; 


where Y = consumption, Z = savings, X = income, and X = Y + Z, that is, income is equal to consumption 
plus savings. 


“Adapted from Wojciech W. Charemza and Derek F. Deadman, Econometric Practice: General to Specific Modelling, Cointegra- 
tion and Vector Autogression, Edward Elgar, Brookfield, Vermont, 1992, p. 18. 


‘adapted from Peter Kennedy, A Guide to Econometrics, 3d ed., The MIT Press, Cambridge, Massachusetts, 1992, p. 308 
Question #9. 
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a. What is the relationship, if any, between a, and B,? Show your calculations. 
b. Will the residual sum of squares, RSS, be the same for the two models? Explain. 
c. Can you compare the R? terms of the two models? Why or why not? 


7.14. Suppose you express the Cobb-Douglas model given in Eq. (7.9.1) as follows: 
Yi) = Bi XG AG i 
If you take the log-transform of this model, you will have In u, as the disturbance term on the right-hand 


side. 


a. What probabilistic assumptions do you have to make about In u, to be able to apply the classical 
normal linear regression model (CNLRM)? How would you test this with the data given in 
Table 7.3? 


b. Do the same assumptions apply to u;? Why or why not? 
7.15. Regression through the origin. Consider the following regression through the origin: 
Y; = BX + PsX3; + û; 
How would you go about estimating the unknowns? 
Will $` a; be zero for this model? Why or why not? 
Will ) ` a; X2; = }_ ú; X3; = 0 for this model? 


When would you use such a model? 
e. Can you generalize your results to the k-variable model? 


xo Sf A 


(Hint; Follow the discussion for the two-variable case given in Chapter 6.) 


Empirical Exercises 
7.16. The demand for chicken in the states of India for 1992-93 is given in Table 7.6. The data for the 
following variables are given: 
Y = Consumption of Chicken in kg 
X, = Price of Chicken in Rs per kg 
X, = Price of Fish in Rs per kg 
X, = Per capita income in thousands of rupees at 1993—94 prices 


You are asked to consider the following demand functions: 
Y, = æ; + Xp; + 3X3; + 4X4; + U; 
Y, = B, + B, InX,; + a3 InX3; + a4 INX4; + Ui 


a. Estimate the parameters of the linear model and interpret the results. 

b. Estimate the parameters of the log-linear model and interpret the results. 

c. B, B; and B, give, respectively, the own-price, cross-price and income elasticities of demand. 
What are their a priori signs? Do the results concur with the a priori expectations? 

d. How would you compute the own-price, cross-price and income elasticities for the linear model? 

e. On the basis of your analysis, which model, if either, would you choose and why? 


234 Basic Econometrics 


Table 7.6 Demand for Chicken across Indian States 


States Y X2 X3 X4 
1 0.05 38.6 i 16.91 7.42 
2 0.12 47.67 36.83 8.56 
3 0.08 40.13 27.86 5.72 
4 0.01 31 20.17 3.04 
5 0.05 50.6 2599 16.56 
6 0.0] 52 31.5 9.8 
7 0.02 44.5 13.07 7.84 
8 0.03 44 16.9 7.94 
9 0.66 0.03 0.6 6.58 
10 0.01 40 23.09 12.18 
11 0.06 49.17 . 36.88 5.84 
12 0.05 44.8 37.63 6.89 
13 0.13 52.38 48.27 9.13 
14 0.02 50 19.41 4.9 
15 0.04 56.5 52 7.55 
16 0.02 37.5 20.18 8.96 
17 0.06 44.67 37.49 5.53 
18 0.03 43 21.3 6.76 
19 0.2 39.1 16.83 15.19 
20 0.02 56 ` 28.5 19.76 
21 0.02 67.5 27:28 18.17 
22 0.02 44 13.75 9.78 


Source: Consumption of Some important Commodities in India—Report no. 404, NSS 50th Round 1993-94, State-tables—Table 1. Pg B1 to B64. National 
Sample Survey Organisation, Department of Statistics, Govt of India 
Note: The data given pertains to items consumed per person for a period of 30 days across different states and Union Territories in Rural India. 


7.17. Wildcat activity. Wildcats are wells drilled to find and produce oil and/or gas in an improved area or to 
find a new reservoir in a field previously found to be productive of oil or gas or to extend the limit of a 
known oil or gas reservoir. Table 7.7 gives data on these variables: 

Y = the number of wildcats drilled 
X, = price at the wellhead in the previous period (in constant dollars, 1972 = 100) 
X, = domestic output 
X, = GNP constant dollars (1972 = 100) 
X; = trend variable, 1948 = 1, 1949 = 2, ..., 1978 = 31 
See if the following model fits the data: 


Y, = By + BoXr; + Bs ln X3; + BaXay + BSX5, + uy 


a. Can you offer an a priori rationale to this model? 


b. Assuming the model is acceptable, estimate the parameters of the model and their standard errors, 
and obtain R? and R?. 


c. Comment on your results in view of your prior expectations. 
d. What other specification would you suggest to explain wildcat activity? Why? 


"I am indebted to Raymond Savino for collecting and processing these data. 
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Table 7.7 Wildcat Activity 


Domestic 


Output 
Per Barrel (millions of GNP, 
Thousands Price, -~ barrels Constant 
of Wildcats, Constant $ per day) $ Billions Time 
(Y) (X) ` (X3) (X4) (X5) 
8.01 4.89 552 487.67 1948 = 1 
9.06 4.83 5.05 490.59 1949=2 
10.31 4.68 5.41 533.55 1950 =3 
11.76 4.42 6.16 576.57 1951 =4 
12.43 4.36 6.26 598.62 195255 
13.31 4.55 6.34 621.77 1953 =6 
13.10 4.66 6.81 613.67 195457 
14.94 4.54 7m5 654.80 1955 = 8 
16.17 4.44 P A 668.84 1956 =9 
14.71 4.75 6.71 681.02 1957 = 10 
13.20 4.56 7.05 679.53 1958=11 
13.19 4.29 7.04 720.53 1959 =12 
11.70 4.19 7.18 736.86 1960 = 13 
10.99 4.17 7.33 755.34 1961 = 14 
10.80 4.11 7.54 799.15 1962 =15 
10.66 4.04 7.61 830.70 1963 = 16 
10.75 3.96 7.80 874.29 1964 = 17 
9.47 3.85 -8.30 925.86 1965 = 18 
10.31 3.75 8.81 980.98 1966 = 19 
8.88 3.69 8.66 1,007.72 1967 = 20 
8.88 3.56 8.78 1,051.83 1968 = 21 
9.70 3.56 9.18 1,078.76 1969 = 22 
7.69 3.48 9.03 1,075.31 1970= 23 
6.92 3.53 9.00 1,107.48 1971 = 24 
7.54 3.39 8.78 1,171.10 1972.25 
7.47 3.68 8.38 1,234.97 1,973. 26 
8.63 5:92 8.01 1,217.81 1974 =27 
9.21 6.03 7.78 1,202.36 1975 = 28 
9.23 6.12 7.88 1,271.01 1976 = 29 
9.96 6.05 7.88 1332167 1977 = 30 
10.78 5.89 8.67 1,385.10 


197831 


Source: Energy Information Administration, 1978 Report to Congress. 


7.18. U.S. defense budget outlays, 1962—1981. In order to explain the U.S. defense budget, you are asked to 
consider the following model: 
Y, = Pi + BoX2: + b3X3r + Pa Xar + Bs Xsi + u, 
where Y, = defense budget-outlay for year t, $ billions 
X>, = GNP for year t, $ billions 
X = U.S. military sales/assistance in year t, $ billions 
X4 = aerospace industry sales, $ billions 
Xs, = military conflicts involving more than 100,000 troops. This variable takes a value of 1 when 
100,000 or more troops are involved but is equal to zero when that number is under 100,000. 
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Table 7.8 U.S. Defense Budget Outlays, 1962-1981 


Year 


1962 
1963 
1964 
1965 
1966 
1967 
1968 
1969 
1970 
IEA 
1972 
1973 
1974 
1975 
1976 
1977 
1978 
1979 
1980 
1981 


Source: These data were collected by Albert Lucchino from various government publications. 


Defense 
Budget 
Outlays 


(Y) 
Stal 
523 
53.6 
49.6 
56.8 
70.1 
80.5 
81.2 
80.3 
ZA 
78.3 
74.5 
77.8 
85.6 
89.4 
97.5 
105.2 
LUZA 
1359 
162.1 


U.S. Military 
Sales/ 
GNP Assistance 
(X2) (X3) 
560.3 0.6 
590.5 0.9 
632.4 1.1 
684.9 1.4 
749.9 1.6 
793.9 1.0 
865.0 0.8 
931.4 les 
992.7 1.0 
1,077.6 1.5 
1,185.9 2.95 
1,326.4 4.8 
1,434.2 10.3 
1,549.2 16.0 
1,718.0 14.7 
1,918.3 8.3 
2,163.9 ANEO 
2,417.8 13.0 
2,633.1 15358! 
2,937.7 18.0 


Aerospace 
Industry 


Sales 
(X4) 
16.0 
16.4 
16.7 
1720 
20.2 
23.4 
25.6 
24.6 
24.8 
21.7 
21.5 
24.3 
26.8 
29.5 
30.4 
33.3 
38.0 
46.2 


5767 


68.9 


Conflicts 
100,000 + 


(Xs) 


eS) Ot) Seay) SS SS SSS) eS) = 


To test this model, you are given the data in Table 7.8. 


a. Estimate the parameters of this model and their standard errors and obtain R?, modified R°, and R?. 


b. Comment on the results, taking into account any prior expectations you have about the relationship 
between Y and the various X variables. 


c. What other variable(s) might you want to include in the model and why? 


7.19. The demand for potatoes in India, 1992-93. To study the per capita consumption of potatoes across the 
states in India, you are given the data in Table 7.9 


where Y = per capita consumption of potatoes in kg 


X, = Income per capita in thousand Rupees at 1993—94 prices 


X, = Price of potatoes in Rupees per kg 

X4 = Price of Cabbage in Rupees per kg 

X; = Price of Cauliflower in Rupees per kg 
Now consider the following demand functions: 


In Y, = Y; + YX; + ¥3X3; + Y4X4i + U; 


In ¥; = B, + BX; + B3X3; + ByX4; + BsX5; + U; 


wv 


(1) 
(2) 
(3) 
(4) 


Table 7.9 Demand for Potatoes in India, 1992-93 
e yt U UUU 


States iZ 
1 0.15 
2 0.17 
3 0.21 
4 0.23 
5 0.38 
6 0.49 
7 0.54 
8 0.54 
9 0.57 
10 0.71 
11 0.81 
12 0.85 
13 0.88 
14 0.98 
15 1.11 
16 Te 
17 1.34 
18 1.54 
19 2.36 
20 2.4 


X2 
6.58 
7.42 
7.94 
7.84 
12.18 
9.13 
755 
9.78 
8.56 
5.84 
6.54 
9.8 
7.87 
11.08 
6.18 
8.96 
1271 
18.17 
6.76 
TSMI9 


X3 
5.47 
5.18 
6.14 
4.35 
4.84 
7.8 
3.93 
4.83 
4.82 
4.96 
3.84 
4.68 
3.74 
3.36 
3.26 
4.29 
2172 
327 
2.55 
2.88 


2.1 
1.76 


2.4 


5:5 

5.27 
4.13 
6.77 
4.27 
37 

3.78 
5.74 
4.73 
2.91 
2.69 
25 
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Source: Consumption of Some important Commodities in India—Report no.404, NSS 50th Round 1993-94, State-tables—Table 1, Pg B1 to B64, National 


Sample Survey Organisation, Department of Statistics, Govt Of India. 


Note: The data given pertains to items consumed per person for a period of 30 days across different states and Union Territories in Rural India. 


From microeconomic theory it is known that the demand for a commodity generally depends on the 
real income of the consumer, the real price of the commodity, and the real price of competing or 
complementary commodities. In view of these considerations, answer the following questions. 


a. 


b. 
c. 
d. 


S 


Which demand function among the ones given here would you choose, and why? 
How would you interpret the coefficients of In X,; and In X3; in these models? 
What is the difference between specifications (2) and (4)? 


What problem do you foresee if you adopt specification (4)? (Hint: Prices of both Cabbage and 


Cauliflower are included along with the price of potato.) 


Are cabbage and/or cauliflower competing or substitute products to potato? How do you know? 
Assume function (4) is the “correct” demand function. Estimate the parameters of this model, 


obtain their standard errors and R?, R?, and modified R°. Interpret your results. 


Now suppose you run the “incorrect” model (2). Assess the consequences of this mis-specification 
by considering the values of y, and y; in relation to B, and $}, respectively. (Hint: Pay attention to 


the discussion in Section 7.7) 
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7.20. In a study of turnover in the labor market, James F. Ragan, Jr., obtained the following results for the 
U.S. economy for the period of 1950-I to 1979-IV." (Figures in the parentheses are the estimated t 
statistics.) 


nY = 447 — 084 1nX>,+ aA 1-22 jae, 


(4.28) (—5.31) (3.64) (3.10) 
+ 0.80InX5,— 0.0055 Xe R? = 0.5370 
(1.10) (—3.09) 


Note: We will discuss the ż statistics in the next chapter. 
where Y = quit rate in manufacturing, defined as number of people leaving jobs voluntarily per 100 
employees 
X, = an instrumental or proxy variable for adult male unemployment rate 
X, = percentage of employees younger than 25 
X, =N,_,/N,_4 = ratio of manufacturing employment in quarter (t — 1) to that in quarter (1 — 4) 
X; = percentage of women employees 
Xs = time trend (1950-1 = 1) 
Interpret the foregoing results. 
Is the observed negative relationship between the logs of Y and X, justifiable a priori? 
Why is the coefficient of In X; positive? 
Since the trend coefficient is negative, there is a secular decline of what percent in the quit rate and 
why is there such a decline? 
e. Is the R? “too” low? 
f Can you estimate the standard errors of the regression coefficients from the given data? Why or 
why not? 


aN SR 


7.21. Consider the following demand function for money in India for the period 1970-71 to 2004—05 


M, = Bi YPrP e": 


where M = real money demand, using the M, definition of money 
Y = real GDP v 
r = interest rate 


To estimate the above demand for money function, you are given the data in Table 7.10. 
Note: To convert nominal quantities into real quantities, divide M by WPI. There is no need to divide 
the interest rate by WPI. What about GDP? 
a. Given the data, estimate the above demand function. What are the income and interest rate 
elasticities of demand for money? ; 
b. Instead of estimating the above demand function, suppose you were to fit the function (M/Y), = 
ar; e". How would you interpret the results? Show the necessary calculations. 


c. How will you decide which is a better specification? (Note: A formal statistical test will be given 
in Chapter 8.) 


"Source: See Ragan’s article, “Turnover in the Labor Market: A Study of Quit and Layoff Rates,” Economic Review, Federal 
Reserve Bank of Kansas City, May 1981, pp. 13-22. 


Table 7.10 


Year 
1970-71 
1971-72 
1972-73 
1973-74 
1974-75 
1975-76 
1976-77 
1977-78 
1978-79 
1979-80 
1980-81 
1981-82 
1982-83 
1983-84 
1984-85 
1985-86 
1986-87 
1987-88 


Demand for Money in India, 1970-71 to 2004-05 


M1 
7374 
8323 
9700 
11200 
1975 
13825 
16024 
14388 
17292 
20000 
23424 
24937 
28535 
33398 
39915 
44095 
51516 
58555 


GDP 
474,131 
478,918 
477,392 
499,120 
504,914 
550,379 
557,258 
598,885 
631,839 
598,974 
641,921 
678,033 
697,861 
752,669 
782,484 
815,049 
850,217 
880,267 


Irate 
725 
725 
7-25 
725 


11 
11 
11 
11 
10 


wel 
14.3 
15.1 
16.7 
20.1 
25.1 
24.8 
25.3 
26.6 
26.6 
31.2 
36.9 
40.4 
41.4 
45.3 
48.5 
51.3 
54 
58.2 


Year 
1988-89 
1989-90 
1990-91 
1991-92 
1992-93 
1993-94 
1994-95 
1995-96 
1996-97 
1997-98 
1998-99 
1999-00 
2000-01 
2001-02 
2002-03 
2003-04 
2004-05 


M1 
66786 
81060 
92892 
114406 
124066 
150778 
192257 
214835 
240615 
267844 
309068 
341796 
379450 
422843 
473581 
578716 
647495 


GDP 

969,702 
1,029,178 
1,083,572 
1,099,072 
1,158,025 
1,223,816 
1,302,076 
1,396,974 
1,508,378 
1,573,263 
1,678,410 
1,786,526 
1,864,773 
1,972,912 
2,047,733 
2,222,591 
2,389,660 


irate 
10 
10 

11 
13 

ial 
10 

11 
13 
13 
liv 
ics 
10.5 
10 
8.25 
5.875 
5.375 
6 
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WPI 
62.2 
66.9 
ZOS 
33.9 
92.3 
100 
112.6 
121.6 
1272 
132.8 
140.7 
145.3 
1557 
161.3 
166.8 
175.9 
187.3 


Source: Handbook of Statistics on Indian Economy, 2009-10, RBI. Mumbai and Handbook of Industrial Policy and Statistics-2007-08; Office of Economic 


Advisor, Govt. 


of India; 


Note: M1 = includes currency with the public, demand deposits and other deposits with RBI (Rupee Crore) 
GDP = GDP at factor cost in 1999-2000 prices (Rs. crore) 
Irate = Interest rate = Average commercial bank deposit rate for above 5 years time deposit ( percent per annum) 
WPI = Wholesale price index (1993-94 prices) 


7.22. Table 7.11 gives data for the Indian Public Sector for the period 1985 to 2000. 

a. See if the Cobb-Douglas production function fits the data given in the table and interpret the 
results. What general conclusion do you draw? 
b. Now consider the following model: 


Output/labor = A(K/L)* e" 


where the regressand represents labor productivity and the regressor represents the capital labor ratio. 
What is the economic significance of such a relationship, if any? Estimate the parameters of this model 


and interpret your results. 
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Table 7.11 Indian Public Sector Enterprises 
EE eee 


Year Output Capital Labour 
1985 443.34 461.17 1214722 
1986 379.05 536.1 1607128 
1987 413.52 509.88 1545389 
1988 466.2 677.5 1664926 
1989 ` 473.05 610.72 1593680 
1990 544.1 598.71 1609015 
1991 508.71 601.06 1626806 
1992 468.36 585.62 1543411 
1993 451.3 429.62 1568153 
1994 - 413.1 527.87 1565944 
1995 5 391.15 468.95 1530029 
1996 445.07 451.34 1477010 
1997 406.75 446.3 1468938 
1998 388.31 446.51 1467611 
1999 266.69 509.39- å 1434503 
2000 307.4 488.33 1371988 


Source: The data is borrowed from Sangeetha (2008) ‘Essays on Performance Contract and Ownership Reforms in Public Sector Enterprises: Evidence from 
India’, PhD thesis, Indira Gandhi Institute of Development Research, Mumbai, India 
Note: Output = Nominal Output deflated by WPI (base 1981-82 = 100), in Rupee crore 

Capital = Capital employed deflated by WPI (base 1981-82 = 100), in Rupee crore 

Labour = Number of Labour employed 


7.23. Monte Carlo experiment: Consider the following model: 


Y; = Bi + BoX2; + Bs X3; + ui; 


You are told that 8, = 262, B, = -0.006, B, = —2.4, g” = 42. and u; ~ N(0. 42). Generate 10 sets of 64 
observations on u; from the given norma! distribution and use the 64 observations given in Table 6.4, 
where Y = CM, X, = PGNP, and X, = FLR to generate 10 sets of the estimated B coefficients (each set 
will have the three estimated parameters). Take the averages of each of the estimated 8 coefficients and 
relate them to the true values of these coefficients given above. What overall conclusion do you draw? 
7.24. Table 7.12 gives data for real consumption expenditure, real income, real wealth, and real interest rates 
for the U.S. for the years 1947-2000. These data will be used again for Exercise 8.35. 
a. Given the data in the table, estimate the linear consumption function using income, wealth, and 
interest rate. What is the fitted equation? 
b. What do the estimated coefficients indicate about the variables’ relationships to consumption 
expenditure? 


Table 7.12 Real Consumption Expenditure, Real Income, Real Wealth, and Real Interest Rates for 


the US., 1947—2000 


Year 


1947 
1948 
1949 
1950 
1951 
1952 
1953 
1954 
1955 
1956 
1957 
1958 
1959 
1960 
1961 
1962 
1963 
1964 
1965 
1966 
1967 
1968 
1969 
1970 
1971 
1972 
1973 
1974 
1975 
1976 
1977 
1978 
17/8) 
1980 
1981 
1982 
1983 
1984 
1985 
1986 
1987 
1988 
1989 
1990 


1035.2 
1090.0 
1095.6 
1927 
1227.0 
1266.8 
1327.5 
1344.0 
1433.8 
1502.3 
15395 
15537 
1623.8 
1664.8 
1720.0 
1803.5 
1871.5 
2006.9 
2131.0 
2244.6 
2340.5 
2448.2 
2524.3 
2630.0 
2745.3 
2874.3 
3072.3 
3051.9 
3108.5 
3243.5 
3360.7 
3527.5 
3628.6 
3658.0 
3741.1 
3791.7 
3906.9 
4207.6 
4347.8 
4486.6 
4582.5 
4784.1 
4906.5 
5014.2 


Wealth 


5166.8 
5280.8 
5607.4 
5759.5 
6086.1 
6243.9 
6355.6 
6797.0 
772.2 
PPO: 2 
7315.3 
7870.0 
8188.1 
8351.8 
8971.9 
9091.5 
9436.1 
10003.4 
10562.8 
10522.0 
S2 
12145.4 
11672.3 
11650.0 
125129 
13499.9 
13081.0 
11868.8 
12634.4 
13456.8 
13786.3 
14450.5 
15340.0 
15965.0 
15965.0 
16312.5 
16944.8 
17526.7 
19068.3 
20530.0 
21235.7 
22332.0 
23659.8 
23105.1 


Interest Rate 


—10.351 
—4.720 
1.044 
0.407 
—5.283 
—0.277 
0.561 
—0.138 
0.262 
—0.736 
—0.261 
—0.575 
2.296 
1.511 
1.296 
1.396 
2.058 
2.027 
2AN 2 
2.020 
1213 
1.055 
1732 
1.166 
—0.712 
—0.156 
1.414 
—1.043 
—3.534 
—0.657 
—1.190 
0.113 
1.704 
2.298 
4.704 
4.449 
4.691 
5.848 
4.331 
3.768 
2.819 
3.287 
4.318 
3.3995 


(Contd) 


241 


242 Basic Econometrics 


(Contd) 
Year Cc Yd Wealth Interest Rate 
1991 4466.6 5033.0 24050.2 - | 1.803 
1992 4594.5 5189.3 24418.2 1.007 
1993 4748.9 5261.3 25092.3 0.625 
1994 4928.1 5397.2 25218.6 2.206 
1995 5075.6 5539.1 27439.7 3.338 
1996 5237.5 5677.7 29448.2 3.083 
1997 5423.9 _ 5854.5 i 32664.1 3.120 
1998 5683.7 6168.6 35587.0 3.584 
1999 5968.4 6320.0 39591.3 3.245 
2000 6257.8 6539.2 38167.7 3.576 


Notes: Year = calendar year. 
C = real consumption expenditures in billions of chained 1996 dollars. 
Yd = real personal disposable income in billions of chained 1996 dollars. 
Wealth = real wealth in billions of chained 1996 dollars. 
Interest = nominal annual yield on 3-month Treasury securities—inflation rate (measured by the annual % change in annual chained 
price index). 

The nominal real wealth variable was created using data from the Federal Reserve Board’s measure of end-of-year net worth for 
households and nonprofits in the flow of funds accounts. The price index used to convert this nominal! wealth variable to a real wealth 
variable was the average of the chained price index from the 4th quarter of the current year and the Ist quarter of the subsequent year. 


Sources: C, Yd, and quarterly and annual chain-type price indexes (1996 = 100): Bureau of Economic Analysis, U.S. Department 
of Commerce (http://www.bea.doc.gov/bea/ dn1 -htm). 

Nominal annual yield on 3-month Treasury securities: Economic Report of the President, 2002. 

Nominal wealth = end-of year nominal net worth of households and nonprofits (from Federal Reserve flow of funds data: 
http://www. federalreserve.gov). 


7.25. Estimating Qualcomm stock prices. As an example of the polynomial regression, consider data on the 
weekly stock prices of Qualcomm, Inc., a digital wireless telecommunications designer and manufac- 
turer over the time period of 1995 to 2000. The full data can be found on the textbook’s website in 
Table 7.13. During the late 1990, technological stocks were particularly profitable, but what type of 
regression model will best fit these data? Figure 7.4 shows a basic plot of the data for those years. 

This plot does seem to resemble an elongated S curve; there seems to be a slight increase in the 
average stock price, but then the rate increases dramatically toward the far right side of the graph. As 
the demand for more specialized phones dramatically increased and the technology’boom got under 
way, the stock price followed suit and increased at a much faster rate. 

a. Estimate a linear model to predict the closing stock price based on time. Does this model seem to 
fit the data well? 

b. Now estimate a squared model by using both time and time-squared. Is this a better fit than in (a)? 

c. Finally, fit the following cubic or third-degree polynomial: 


Y; = Bo + PiX: + BoX? + bX? +u; 


where Y = stock price and X = time. Which model seems to be the best estimator for the stock prices? 
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Figure 7.4 Qualcomm stock prices over time. 
Key to Multiple Choice Questions 
MEC) 2. (b) 3. (a) 4. (b) 5. (b) 6. (d) 7. (c) 8. (b) 9. (c) 


10. (b) 11. (a) 125) BLO S Ke) 15. (d) 16. (b) 17. (d) 18. (a) 
19. (b) 20. (d) 


->o Appendix 7A 
7A.1| Derivation of OLS Estimators Given in Equations (7.4.3) to (7.4.5) 


Differentiating the equation 
La? = JOO: — Bi — Xai - Xai) 7.4.2) 
partially with respect to the three unknowns and setting the resulting equations to zero, we obtain 
“ht =2 ee — Bi — BoX2 — B3X3;)(—1) = 0 
1 


a? k 
= =2 Se: — By — ÊX — BsX3;)(—X2) = 0 
2 


a? a 
a =29 0; — Bi — BeX2i — BsX3i (Xai) = 0 
3 


Simplifying these, we obtain Eqs. (7.4.3) to (7.4.5) 
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In passing, note that the three preceding equations can also be written as 
iu 
X 4j)X2=0 (Why?) 


X ai Xi =0 


which show the properties of the least-squares fit, namely, that the residuals sum to zero and that they are uncorrelated 
with the explanatory variables X, and X3. 

Incidentally, notice that to obtain the OLS estimators of the k-variable linear regression model (7.4.20) we proceed 
analogously. Thus, we first write 


Re ee a 


Differentiating this expression partially with respect to each of the k unknowns, setting the resulting equations equal to 
zero, and rearranging, we obtain the following k normal equations in the k unknowns: 


JOY = Bi + be D> Xai + BY Xi +--+ Be >) Xu 
> VX =Â} Xai + Bo D> X3, + Bs YO Xai Xs: +++ + Be Y Xai Xi 


D Y; Xz; = Bi $ Xs + po $ Xu Xai + Bs ee: “ag Bi > Xa Xti 


DO YiXu = Bi DO Xu + Bo) Xai Xi + Bs Y Xai Xu +--+ Be DXF, 
Or, switching to small letters, these equations can be expressed as 


> yix2 = hr) xi + Bs > ax; +-+ Ê $ xaixei 
J rixa = py >> x23 + Bs Dox +++ Êr > x3ixKi 


PG eT Ce E a seach cats acy cat hat eC Bet eer CET E fy cs. noc Pet A 


$ yixe = ĝ È xaixu + Bs > 3K +-+ Bi Dean 


It should further be noted that the k-variable model also satisfies these equations: 


ai =0 
$ ai Xi = Y aX: =.= Y aXe =0 


7A.2 Equality between the Coefficients of PGNP 
in Equations (7.3.5) and (7.6.2) 
Letting Y = CM, X, = PGNP, and X, = FLR and using the deviation form, write 
Yi = 13x35 + hy; 
x2; = b23x3; + ûn 
Now regress #; on Q3 to obtain: 


= 2îuûn _ _9 9956 (for our example) 


Note that because the 7's are residuals, their mean values are zero. Using (1) and (2), we can write (3) as 


= $ (y: — b13%3:)(%21 — b23x3;) 
D (xz; — 23x31)? 


a] 


Expand the preceding expression, and note that 


Dooa 
bas = A 
D, 
and 
E Dey x3 
yxy, 


Making these substitutions into (4), we get 


EA) (2x3) — (Lae) 


= —0.0056 (for our example) 


7A.3 Derivation of Equation (7.4.19) 
Recall that 
it; = Y; — By — BoX2 — BsX3i 
which can also be written as 
ai =yi- Box2i = B3x3i 


where small letters, as usual, indicate deviations from mean values. 
Now 
= $ aii — faxa — paxa) 


= Sim 
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(1) 
(2) 


(3) 


(4) 


(5) 


(6) 


(7.4.7) 
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where use is made of the fact that 5° 4 ;x2; = }_ #;x3; = 0. (Why?) Also 


> ûn = Do vidi = YO vii — Boxai — faxa) 
ii = Be: — p X yx — Bs Syms (7.4.19) 


that is, 


which is the required result. 


7A.4 Maximum Likelihood Estimation of the Multiple Regression Model 


Extending the ideas introduced in Chapter 4, Appendix 4A, we can write the log-likelihood function for the k-variable 
linear regression model (7.4.20) as 


Cr pi prea — PrXi)? 
inZ = — Ino? ~FIn(2n )- Dt i 
Differentiating this function partially with respect to B,, B>,... , B, and o°, we obtain the following (K + 1) equations: 
dln Le 
Y; — Xai — +++ — PkXkiX(—1 
I x — Bi — PoX> Br Xi)(—1) (1) 
dInL 1 
=-— Y; — Bi — BoXx —--- — pkX ki — Xni 
ETA 52 O Bi BX Bi Xii )(—X2i) (2) 
dlnL 1 
ae: = — Bi = PaXai = ++: — PeX uN -Xu) (K) 
ðlnL n 
See, Y, = 6; oe = .)2 
sor s-a + ag Li — Bi ~ aXe PiX) (K+1) 
Setting these equations equal to zero (the first-order condition for optimization) and letting 6,, Bo, ..., fy and &? denote 


the ML estimators, we obtain, after simple algebraic manipulations, 


Dov = Bi + hY Xai +e + Be YO Xu 


Do Vi Xo = Bi Yo Xu +BY X +--+ Be Yo Xu Xu 


VX = ĝı > Xu + By So Xai Xi apode + Be >> Xh 


which are precisely the normal equations of the least-squares theory, as can be seen from Appendix 7A, Section 7A.1. 
Therefore, the ML estimators, the ĝ’s, are the same as the OLS estimators, the B s, given previously. But as noted in 
Chapter 4, Appendix 4A, this equality is not accidental. 

Suepbiting the ML (= OLS) estimators into the (K + 1)st equation just given, we obtain, after simplification, the ML 
estimator of a? as 


2 1 = = Al 
“z = GS — Bi — PoXu —---— BX)? 
l -2 
=> 7I ia? 
As noted in i text, this estimator differs from the OLS estimator 6 =y ui? /(n — k). And since the latter is an unbiased 


estimator of o°, m conclusion implies that the ML estimator “i is a biased estimator. But, as can be readily verified, 
asymptotically, &? is unbiased too. 
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7A.5 EViews Output of the Cobb-Douglas Production Function 
in Equation (7.9.4) 
Dependent Variable: Y1 


Method: Least Squares 
Included observations: 51 


Coefficient Std. Error t-Statistic Prob. 
c 3.887600 0.396228 9.811514 0.0000 
bes 0.468332 0.098926 4.734170 0.0000 
eo 0.521279 0.096887 5.380274 0.0000 
R-squared 0.964175 Mean dependent var. 16.94139 
Adjusted R-squared 0.962683 S.D. dependent var. 1.380870 
S.E. of regression 0.266752 Akaike info criterion 0.252028 
Sum squared resid. 3.415520 Schwarz criterion 0.365665 
Log likelihood —3.426721 Hannan-Quinn criterion 0.295452 
F-statistic 645.9311 Durbin-Watson stat. 1.946387 
Prob. (F-statistic) 0.000000 
Covariance of Estimates 
C Y2 Y3 
C 0.156997 0.010364 —0.020014 
Y2 0.010364 0.009786 —0.009205 
Y3 —0.020014 —0.009205 0.009387 
Y X2 X3 Y1 Y2 Y3 Y1HAT YTRESID 
38,372,840 424,47] 2,689,076 17.4629 12.9586 14.8047 17.6739 —0.2110 
1,805,427 19,895 57,997 14.4063 9.8982 10.9681 14.2407 0.1656 
23,736,129 206,893 2,308,272 16.9825 12.2400 14.6520 17.2577 = —0.2752 
26,981,983 304,055 19976)235 17.1107 12.6250 14.1349 17.1685 —0.0578 
217,546,032 1,809,756 13,554,116 19979 14.4087 16.4222 19.1962 0.0017 
19,462,751 180,366 1,790,751 16.7840 12.1027 14.3981 17.0612  —0.2771 
28,972,772 224,267 1,210,229 17.1819 12.3206 14.0063 16.9589 0.2229 
14,313,157 54,455 421,064 16.4767 10.9051 12.9505 15.7457 0.7310 
159,921 2,029 7,188 11.9824 7.6153 8.8802 12.0831 ~ —0.1007 
47,289,846 471,211 2,761,281 17.6718 13.0631 14.8312 17.7366 —0.0648 
63,015,125 659,379 3,540,475 17.9589 13.3991 15.0798 18.0236 —0.0647 
1,809,052 17,528 146,371 14.4083 9.7716 11.8939 14.6640  —0.2557 
10,511,786 75,414 848,220 16.1680 11.2307 13.6509 16.2632 —0.0952 


105,324,866 963,156 5,870,409 18.4726 13.7780 15.5854 18.4646 0.0079 
90,120,459 835,083 5,832,503 18.3167 13.6353 15.5790 18.3944 —0.0778 
39,079,550 336,159 1,795,976 17.4811 127253 14.4011 17.3543 0.1269 
22,826,760 246,144 1595, 18 16.9434 12.4137 14.2825 17.1465 —0.2030 
38,686,340 384,484 2,503,693 17.4710 12.8597 14.7333 17.5903 —0.1193 
69,910,555 216,149 4,726,625 18.0627 12.2837 15.3687 17.6519 0.4109 

7,856,947 82,021 415,131 15.8769 11.3147 12.9363 15.9301 —0.0532 
21,352,966 174,855 1,729,116 16.8767 12.0717 14.3631 17.0284 —0.1517 
46,044,292 355,701 2,706,065 17.6451 12.7818 14.8110 17.5944 0.0507 


(Contd) 
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(Contd) 


Y 


92,335,528 
48,304,274 
17,207,903 
47,340,157 
2,644,567 
14,650,080 
7,290,360 
9,188,322 
51,298,516 
20,401,410 
87,756,129 
101,268,432 
3,556,025 
124,986,166 
20,451,196 
34,808,109 
104,858,322 
6,541,356 
37,668,126 
4,988,905 
62,828,100 
172,960,157 
15,702,637 
5,418,786 
49,166,991 
46,164,427 
9,185,967 
66,964,978 
2,979,475 


X2 


943,298 
456,553 
267,806 
439,427 
24,167 
163,637 
59,737 
96,106 
407,076 
43,079 
TUZ 
820,013 


34,723 . 


1,174,540 
201,284 
257,820 
944,998 

68,987 
400,317 
56,524 
582,241 

1,120,382 

150,030 
48,134 
425,346 
313,279 
89,639 
694,628 
15,221 


X3 


5,294,356 
2,833,525 
1,212,281 
2,404,122 
334,008 
627,806 
522,335 
507,488 
3,295,056 
404,749 
4,260,353 
4,086,558 
184,700 
6,301,421 
1,327,353 
1,456,683 
5,896,392 
297,618 
2,500,071 
311,251 
4,126,465 
11,588,283 
762,671 
276,293 
2,731,669 
1,945,860 
685,587 
3,902,823 
361,536 


Y1 


18.3409 
17.6930 
16.6609 
17.6729 
14.7880 
16.5000 
15.8021 
16.0334 
177522 
16.8311 
18.2901 
18.4333 
15.0842 
18.6437 
16.8336 
17.3654 
18.4681 
15.6937 
17.4443 
15.4227 
173997 
18.9686 
16.5693 
15.5054 
17.7107 
17.6477 
16.0332 
18.0197 
14.9073 


Y2 


isa" 
13.0315 
12.4980 
12.9932 
10.0927 
12.0054 
10.9977 
11.4732 
12.9168 
10.6708 
13.4969 
13.6171 
10.4552 
13.9764 
12.2925 
12.4600 
13.7589 
11.1417 
12.9000 
10.9424 
13.2746 
1319292 
11.9186 
10.7817 
12.9607 
12.6548 
11.4035 
13.4511 
- - 9.6304 


15.4822 
14.8570 
14.0080 
14.6927 
120m9 
13.3500 
13.1661 
13,1372 
15.0079 
12.9110 
15.2649 
15.2232 
12.1265 
15.6563 
14.0987 
14.1917 
155899 
12.6036 
14.7318 
12.6484 
15529 
16.2655 


- 13.5446 


12.5292 
14.8204 
14.4812 
13.4380 
151772 
12.7981 


Y1HAT 


18.4010 
17.7353 
17.0429 
17.6317 
15.2445 
16.4692 
15.9014 
16.1090 
17.7603 
15.6153 
18.1659 
18.2005 
15.1054 
18.5945 
16.9564 
17.1208 
18.4580 
15.6756 
17.6085 
15.6056 
18.0451 
18.8899 
16.5300 
15.4683 
17.6831 
17.3630 
16.2332 
18.0988 
15.0692 


YIRESID 


—0.0601 
—0.0423 
—0.3820 
0.0411 
—0.4564 
0.0308 
—0.0993 
—0.0756 
—0.0071 
1.2158 
0.1242 
0.2328 
—0.0212 
0.0492 
—0.1229 
0.2445 
0.0101 
0.0181 
—0.1642 
—0.1829 
—0.0892 
0.0786 
0.0394 
0.0371 
0:0277 
0.2847 
—0.2000 
—0.0791 
—0.1620 


Notes: Y1 = In Y; Y2 = In X2; Y3 = In X3. 


The eigenvalues are 3.7861 and 187,5269, which will be used in Chapter 10. 


CHAPTER 


Multiple Regression Analysis: 
The Problem of Inference 


This chapter. a continuation of Chapter 5, extends the ideas of interval estimation and hypothesis testing 
developed there to models involving three or more variables. Although in many ways the concepts developed 
in Chapter 5 can be applied straightforwardly to the multiple regression model, a few additional features are 
unique to such models. and it is these features that will receive more attention in this chapter. 


8.1 The Normality Assumption Once Again 


We know by now that if our sole objective is point estimation of the parameters of the regression models, 
the method of ordinary least squares (OLS), which does not make any assumption about the probability 
distribution of the disturbances u,, will suffice. But if our objective is estimation as well as inference, then, as 
argued in Chapters 4 and 5, we need to assume that the u; follow some probability distribution. 

For reasons already clearly spelled out, we assumed that the u; follow the normal distribution with zero 
mean and constant variance a`. We continue to make the same assumption for multiple regression models. 
With the normality assumption and following the discussion of Chapters 4 and 7, we find that the OLS 
estimators of the partial regression coefficients, which are identical with the maximum likelihood (ML) 
estimators, are best linear unbiased estimators (BLUE).’ Moreover, the estimators Ba. bs. and By are 
themselves normally distributed with means equal to true B,, 63, and £, and the variances given in Chapter 
7. Furthermore, (n — 3)G2/o7 follows the y” distribution with n — 3 df, and the three OLS estimators are 
distributed independently of c”. The proofs follow the two-variable case discussed in Appendix 3A, Section 
3A. As a result and following Chapter 5, one can show that, upon replacing a” by its unbiased estimator ô“ 
in the computation of the standard errors, each of the following variables 


‘With the normality assumption, the OLS estimators >, B3, and B1 are minimum-variance estimators in the entire class of 
unbiased estimators, whether linear or not. In short, they are BUE (best unbiased estimators). See C. R. Rao, Linear Statistical 
Inference and Its Applications, John Wiley & Sons, New York, 1965, p. 258. 
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a Bi -bı (8.1.1) 
se (Bi) 

pe (8.1.2) 
se (B2) 

E — hs (8.1.3) 
se (f3) 


follows the z distribution with n — 3 df. 

Note that the df are now n — 3 because in computing }_ û? and hence ô? we first need to estimate the three 
partial regression coefficients, which therefore put three restrictions on the residual sum of squares (RSS) 
(following this logic in the four-variable case there will be n — 4 df, and so on). Therefore, the t distribution 
can be used to establish confidence intervals as well as test statistical hypotheses about the true population 
partial regression coefficients. Similarly, the x? distribution can be used to test hypotheses about the true ae 
To demonstrate the actual mechanics, we use the following illustrative example. 


Example 8.1 Child Mortality Example Revisited 


in Chapter 7 we regressed child mortality (CM) on per capita GNP (PGNP) and the female literacy rate (FLR) 
for a sample of 64 countries. The regression results given in Eq. (7.6.2) are reproduced below with some 
additional information: 


EM; = 263.6416 — 0.0056PGNP; — 2.2316 FLR; 
se = (11.5932) (0.0019) (0.2099) ` 
t= (22.7411) (—2.8187) (-10.6293)_- (8.1.4) 
pvalue= (0.0000) (0.0065) (0.0000)` 


R?=0.7077 R? =0.6981 
where * denotes extremely low value. 

In Eq. (8.1.4) we have followed the format first introduced in Eq. (5.11.1), where the figures in the first 
set of parentheses are the estimated standard errors, those in the second set are the t values under the null 
hypothesis that the relevant population coefficient has a value of zero, and those in the third.are the estimated 
p values. Also given are R? and adjusted R? values. We have already interpreted this regression in Example 7.1. 

What about the statistical significance of the observed results? Consider, for example, the coefficient of 
PGNP of —0.0056. Is this coefficient statistically significant, that is, statistically different from zero? Likewise, 
is the coefficient of FLR of -2.2316 statistically significant? Are both coefficients statistically significant? To 
answer this and related questions, let us first consider the kinds of hypothesis testing that one may encounter 
in the context of a multiple regression model. 


8.2 Hypothesis Testing in Multiple Regression: General Comments 


Once we go beyond the simple world of the two-variable linear regression model, hypothesis testing assumes 
several interesting forms, such as the following: 


1. Testing hypotheses about an individual partial regression coefficient (Section 8.3). 
2. Testing the overall significance of the estimated multiple regression model, that is, finding out if all the 
partial slope coefficients are simultaneously equal to zero (Section 8.4). 
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3. Testing that two or more coefficients are equal to one another (Section 8.5). 

Testing that the partial regression coefficients satisfy certain restrictions (Section 8.6). 

5. Testing the stability of the estimated regression model over time or in different cross-sectional units 
(Section 8.7). 

6. Testing the functional form of regression models (Section 8.8). 


= 


Since testing of one or more of these types occurs so commonly in empirical analysis, we devote a section 
to each type. 


8.3 Hypothesis Testing about Individual Regression Coefficients 


If we invoke the assumption that u, ~ M(0, 07), then, as noted in Section 8.1, we can use the f test to test a 
hypothesis about any individual partial regression coefficient. To illustrate the mechanics, consider the child 
mortality regression, Eq. (8.1.4). Let us postulate that 


Ho: Bo = 0 and H: Bo #0 


The null hypothesis states that, with X, (female literacy rate) held constant, X, (PGNP) has no (linear) 
influence on Y (child mortality).” To test the null hypothesis, we use the f test given in Eq. (8.1.2). Following 
Chapter 5 (see Table 5.1). if the computed f value exceeds the critical ż value at the chosen level of signifi- 
cance, we may reject the null hypothesis; otherwise, we may not reject it. For our illustrative example, using 
Eq. (8.1.2) and noting that B, = 0 under the null hypothesis, we obtain 


_ 0.0056 


= —2.8187 i 8.3.1 
0.0020 he 


as shown in Eq. (8.1.4). 

Notice that we have 64 observations. Therefore, the degrees of freedom in this example are 61 (why?). If 
you refer to the ż table given in Appendix D, we do not have data corresponding to 61 df. The closest we have 
are for 60 df. If we use these df, and assume a, the level of significance (i.e., the probability of committing 
a Type I error) of 5 percent, the critical ¢ value is 2.0 for a two-tail test (look up ¢,,. for 60 df) or 1.671 for a 
one-tail test (look up t, for 60 df). 

For our example, the alternative hypothesis is two-sided. Therefore, we use the two-tail t value. Since 
the computed ż value of 2.8187 (in absolute terms) exceeds the critical t value of 2, we can reject the null 
hypothesis that PGNP has no effect on child mortality. To put it more positively, with the female literacy rate 
held constant, per capita GNP has a significant (negative) effect on child mortality, as one would expect a 
priori. Graphically, the situation is as shown in Figure 8.1. 

In practice, one does not have to assume a particular value of a to conduct hypothesis testing. One can 
simply use the p value given in Eq. (8.1.4), which in the present case is 0.0065. The interpretation of this p 
value (i.e., the exact level of significance) is that if the null hypothesis were true, the probability of obtaining 
a t value of as much as 2.8187 or greater (in absolute terms) is only 0.0065 or 0.65 percent, which is indeed 
a small probability, much smaller than the artificially adopted value of a = 5%. 


2In most empirical investigations the null hypothesis is stated in this form, that is, taking the extreme position (a kind of 
straw man) that there is no relationship between the dependent variable and the explanatory variable under consideration. 
The idea here is to find out whether the relationship between the two is a trivial one to begin with. 
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f(t) 


95% 
Region of acceptance 


Density 


Critical region, 
2.5% 


Critical 
region, 
2.5% 


; 2.0 0 +2.0 
Figure 8.1 The 95% confidence interval for ¢ (60 df). 


This example provides us an opportunity to decide whether we want to use a one-tail or a two-tail ż test. 
Since a priori child mortality and per capita GNP are expected to be negatively related (why?), we should use 
the one-tail test. That is, our null and alternative hypothesis should be: 


Ho: B2 < 0 and Hy: Bo = 0 


As the reader knows by now, we can reject the null hypothesis on the basis of the one-tail ¢ test in the present 
instance. If we can reject the null hypothesis in a two-sided test, we will have enough evidence to reject in the 
one-sided scenario as long as the statistic is in the same direction as the test. 

In Chapter 5 we saw the intimate connection between hypothesis testing and confidence interval estimation. 
For our example, the 95 percent confidence interval for $, is: 


Bo — tap se (Ê2) < Bo < Bo + tur se (Bo) 
which in our example becomes 


—0.0056 — 2(0.0020) < 2 < —0.0056 + 2(0.0020) 
that is, 


—0.0096 < B, < —0.0016 (8.3.2) 


that is, the interval, —0.0096 to -0.0016 includes the true B, coefficient with 95 percent confidence coefficient. 
Thus, if 100 samples of size 64 are selected and 100 confidence intervals like Eq. (8.3.2) are constructed, 
we expect 95 of them to contain the true population parameter B,. Since the interval (8.3.2) does not include 
the null-hypothesized value of zero, we can reject the null hypothesis that the true B, is zero with 95 percent 
confidence. 

Thus, whether we use the ż test of significance as in (8.3.1) or the confidence interval estimation as in 
(8.3.2), we reach the same conclusion. However, this should not be surprising in view of the close connection 
between confidence interval estimation and hypothesis testing. 

Following the procedure just described, we can test hypotheses about the other parameters of our child 
mortality regression model. The necessary data are already provided in Eq. (8.1.4). For example, suppose 
we want to test the hypothesis that, with the influence of PGNP held constant, the female literacy rate has no 
effect whatsoever on child mortality. We can confidently reject this hypothesis, for under this null hypothesis 
the p value of obtaining an absolute f value of as much as 10.6 or greater is practically zero. 

Before moving on, remember that the f-testing procedure is based on the assumption that the error term u; 
follows the normal distribution. Although we cannot directly observe u,, we can observe their proxy, the ii, 
that is, the residuals. For our mortality regression, the histogram of the residuals is as shown in Figure 8.2. 
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10 


Series: Residuals 


Sample 1 64 


8 Observations 64 
Mean —4.95 x 10_14 

6 Median 0.709227 
Maximum 96.80276 
Minimum -84.26686 

4 Std. dev. 41.07980 
Skewness 0.227575 
Kurtosis 2.948855 


Jarque-Bera 0.559405 
Probability 0.756009 


0 


-80 -40 0 40 80 
Figure 8.2 Histogram of residuals from regression (8.1.4). 


From the histogram it seems that the residuals are normally distributed. We can also compute the Jarque- 
Bera (JB) test of normality, as shown in Eq. (5.12.1). In our case the JB value is 0.5594 with a p value 0.76.° 
Therefore, it seems that the error term in our example follows the normal distribution. Of course, keep in 
mind that the JB test is a large-sample test and our sample of 64 observations may not be necessarily large. 


8.4 Testing the Overall Significance of the Sample Regression 


Throughout the previous section we were concerned with testing the significance of the estimated partial 
regression coefficients individually, that is, under the separate hypothesis that each true population partial 
regression coefficient was zero. But now consider the following hypothesis: 


Ho: Bo = Bs =i) (8.4.1) 


This null hypothesis is a joint hypothesis that B, and B, are jointly or simultaneously equal to zero. A test 
of such a hypothesis is called a test of the overall significance of the observed or estimated regression line, 
that is, whether Y is linearly related to both X, and X3. 

Can the joint hypothesis in Eq. (8.4.1) be tested by testing the significance of B, and ĝ; individually as in 
Section 8.3? The answer is no, and the reasoning is as follows. 

In testing the individual significance of an observed partial regression coefficient in Section 8.3, we 
assumed implicitly that each test of significance was based on a different (i.e., independent) sample. Thus, 
in testing the significance of 62 under the hypothesis that 8, = 0, it was assumed tacitly that the testing was 
based on a different sample from the one used in testing the significance of f, under the null hypothesis that 
B3 = 0. But to test the joint hypothesis of Eq. (8.4.1), if we use the same sample data, we shall be violating the 
assumption underlying the test procedure.* The matter can be put differently: In Eq. (8.3.2) we established a 
95 percent confidence interval for B,. But if we use the same sample data to establish a confidence interval for 
B3. say, with a confidence coefficient of 95 percent, we cannot assert that both B, and £; lie in their respective 
confidence intervals with a probability of (1 — a)(1 — æ) = (0.95)(0.95). 


3For our example, the skewness value is 0.2276 and the kurtosis value is 2.9488. Recall that for a normally distributed vari- 
able the skewness and kurtosis values are, respectively, 0 and 3. 
áin any given sample the cov (£2, 83) may not be zero; that is, Bz and 63 may be correlated. See Eq. (7.4.1 7). 
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In other words, although the statements 


Pr [Ê> — ta/2 se (Bo) < Bo < Bo + fap se (f:)]=1-a 
Pr [3 — ta/2 se(B3) < Bs < Bs + ta/2 se(B3)] = 1 — æ 


are individually true, it is not true that the probability that the intervals 


[Bo + ta/2 se (Bo), B3 + ta/2 se (Bs)] 


simultaneously include B, and Bz is (1 — a)’, because the intervals may not be independent when the same 
data are used to derive them. To state the matter differently, 


. . . testing a series of single [individual] hypotheses is not equivalent to testing those same hypotheses jointly. 
The intuitive reason for this is that in a joint test of several hypotheses any single hypothesis is “affected” by the 
information in the other hypotheses.” 


The upshot of the preceding argument is that for a given example (sample) only one confidence interval or 
only one test of significance can be obtained. How, then, does one test the simultaneous null hypothesis that 
B2 = B= 0? The answer follows. 


The Analysis of Variance Approach to Testing the Overall Significance of 
an Observed Multiple Regression: The F Test 


For reasons just explained, we cannot use the usual ż test to test the joint hypothesis that the true partial slope 
coefficients are zero simultaneously. However, this joint hypothesis can be tested by the analysis of variance 
(ANOVA) technique first introduced in Section 5.9, which can be demonstrated as follows. 

Recall the identity 


De a Ê È Vixa + Bs $ x3: +ú; 
TSS = . ESS + RSS 


TSS has, as usual, n= 1 df and RSS has n — 3 df for reasons already discussed. ESS has 2 df since it is a 
function of 6) and £3. Therefore, following the ANOVA procedure discussed in Section.5.9, we can set up 
Table 8.1. 

Now it can be shown? that, under the assumption of normal distribution for u; and the null hypothesis 
B2 = B3 = 0, the variable 


(8.4.2) 


pa (Dire + Bs Dyes) /2 _ ESS/df 
= > a2 /(n — 3) = RSS/df (8.4.3) 


is distributed as the F distribution with 2 and n -3 df. 


5Thomas B. Fomby, R. Carter Hill, and Stanley R. Johnson, Advanced Econometric Methods, Springer-Verlag, New York, 1984, 
pro 


See K. A. Brownlee, Statistical Theory and Methodology in Science and Engineering, John Wiley & Sons, New York, 1960 
pp. 278-280. l 
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Source of Variation SS df MSS 
Due to regression (ESS) 2 >* yjx2i + B3 Si yix3, 2 a 
42 
Due to residual (RSS) >? n-3 ôĝ?= = 
a a i= 
Total ay n—1 


What use can be made of the preceding F ratio? It can be proved’ that under the assumption that the 
u; ~ NO, 0”), 


-2 
peti a (8.4.4) 
n—3 
With the additional assumption that B, = B,= 0, it can be shown that 


E(B, $ yixai + bs > ix) TE. 
2 
Therefore, if the null hypothesis is true, both Eqs. (8.4.4) and (8.4.5) give identical estimates of true a”. 
This statement should not be surprising because if there is a trivial relationship between Y and X, and X3, the 
sole source of variation in Y is due to the random forces represented by u;. If, however, the null hypothesis is 
false, that is, X, and X, definitely influence Y, the equality between Eqs. (8.4.4) and (8.4.5) will not hold. In 
this case, the ESS will be relatively larger than the RSS, taking due account of their respective df. Therefore, 
the F value of Eq. (8.4.3) provides a test of the null hypothesis that the true slope coefficients are simultane- 
ously zero. If the F value computed from Eq. (8.4.3) exceeds the critical F value from the F table at the a 
percent level of significance, we reject Hy; otherwise we do not reject it. Alternatively, if the p value of the 
observed F is sufficiently low, we can reject Hp. 
Table 8.2 summarizes the F test. Turning to our illustrative example, we obtain the ANOVA table, as 
shown in Table 8.3. 


Table 8.2 A Summary of the F Statistic 


(8.4.5) 


Null Hypothesis Alternative Hypothesis Critical Region- 
Ho Hı Reject Ho If 
S 2 
a? = of o? > ay 52 > Fa,ndf,ddf 
s 
o? =0} of # oF 32 > Fuj2,ndt,ddt 
2 


or < Fa -a/2),ndf,daf 


Notes: 

: o? and on are the two population variances. 

. S/and Sare the two sample variances. 

. ndf and ddf denote, respectively, the numerator and denominator df. 

. In computing the F ratio, put the larger 5S? value in the numerator. 

. The critical F values are given in the last column. The first subscript of F is the level of significance and the second subscript 
is the numerator and denominator df. 

. Note that F(1—a/2)naf,daf = 1/Fa/2,ddtndt 


tA WN = 


n 


7See K. A. Brownlee, Statistical Theory and Methodology in Science and Engineering, john Wiley & Sons, New York, 1960, 
pp. 278-280. 
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Table 8.3 ANOVA Table for the Child Mortality Example 


Source of Variation SS df MSS 
Due to regression 257,362.4 2 128,681.2 
Due to residuals 106,315.6 61 1742.88 


Total 363,678 63 


Using Eq. (8.4.3), we obtain 


128,681.2 
~ 1742.88 

The p value of obtaining an F value of as much as 73.8325 or greater is almost zero, leading to the 

rejection of the hypothesis that together PGNP and FLR have no effect on child mortality. If you were to use 

the conventional 5 percent level-of-significance value, the critical F value for 2 df in the numerator and 60 

df in the denominator (the actual df, however, are 61) is about 3.15, or about 4.98 if you were to use the 1 

percent level of significance. Obviously, the observed F of about 74 far exceeds any of these critical F values. 
We can generalize the preceding F-testing procedure as follows. 


= 73.8325 (8.4.6) 


Testing the Overall Significance of a Multiple Regression: The F Test 


Decision Rule 
Given the k-variable regression model: 
Y; = Bi + P2X2i + B3X3j +--+ + BeXki + Ui 
To test the hypothesis 
Ho: B2 = 63 = +++ = Be = 0 
(i.e., all slope coefficients are simultaneously zero) versus 
H1: Not all slope coefficients are simultaneously zero 
compute v 
ESS/df ESS/(k—1 

= a = Ree (8.4.7) 

If F> Fk- 1, n- k), reject Hy; otherwise you do not reject it, where F (k — 1, n — k) is the critical F value at 


the a level of significance and (k — 1) numerator df and (n - k) denominator df. Alternatively, if the p value of 
F obtained from Eq. (8.4.7) is sufficiently low, one can reject Hp. 


Needless to say, in the three-variable case (Y and X,, X,) k is 3. in the four-variable case k is 4, and so on. 

in passing. note that most regression packages routinely calculate the F value (given in the analysis of 
variance table) along with the usual regression output, such as the estimated coefficients, their standard 
errors, t values, etc. The null hypothesis for the ¢ computation is usually assumed to be 6, = 0. 


Individual versus Joint Testing of Hypotheses 


In Section 8.3 we discussed the test of significance of a single regression coefficient and in Section 8.4 we 
have discussed the joint or overall test of significance of the estimated regression (i.e., all slope coefficients 
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are simultaneously equal to zero). We reiterate that these tests are different. Thus, on the basis of the 1 test 
or confidence interval (of Section 8.3) it is possible to accept the hypothesis that a particular slope coefficient, 
Bx is zero, and yet reject the joint hypothesis that all slope coefficients are zero. 


The lesson to be learned is that the joint “message” of individual confidence intervals is no substitute for a joint 

confidence region [implied by the F test] in performing joint tests of hypotheses and making joint confidence 
8 

statements. 


An Important Relationship between R? and F 


There is an intimate relationship between the coefficient of determination R” and the F test used in the 
analysis of variance. Assuming the normal distribution for the disturbances u; and the null hypothesis that 
B2 = B= 0, we have seen that 

L ESS 

~ RSS/(n — 3) 
is distributed as the F distribution with 2 and n —3 df. 


More generally, in the k-variable case (including intercept), if we assume that the disturbances are normally 
distributed and that the null hypothesis is 


Ho: b2 = £3 =- -= By = 0 (8.4.9) 


F (8.4.8) 


then it follows that 

ESS/(k — 1) 
RSS/(n — k) 
follows the F distribution with k — 1 and n — k df. (Note: The total number of parameters to be estimated is k, 


of which 1 is the intercept term.) 
Let us manipulate Eq. (8.4.10) as follows: 


j= (8.4.7) = (8.4.10) 


Tn KESS 
~ k—1RSS 
_n-k ESS 


k — 1 TSS — ESS 
_ n-k ESS/TSS 


mA (8.4.11) 
k —Jel — (ESSVTSS) 
Sie E 
~ k-11—R? 
Re) 


~ (1— RY/(n—h) 


where use is made of the definition R° = ESS/TSS. Equation (8.4.11) shows how F and R, are related. These 
two vary directly. When R? = 0, F is zero ipso facto. The larger the R?, the greater the F value. In the limit, 
when R? = 1, F is infinite. Thus the F test, which is a measure of the overall significance of the estimated 
regression, is also a test of significance of R°. In other words, testing the null hypothesis in Eq. (8.4.9) is 
equivalent to testing the null hypothesis that (the population) R? is zero. 


5Fomby et al., op. cit., p. 42. 
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Table 8.4 ANOVA Table in Terms of R? 


Source of Variation SS df MSS* 

Due to regression REOR y?) 2 RS y?)/2 

Due to residuals (1 — RIZ y?) n-3 (1 — R2)(>- y2)/(n — 3) 
Total yy? n—1 


*Note that in computing the F value there is no need to multiply R? and (1 — R?) by }_ y; because it drops out, as shown in 
Eq. (8.4.12). ` 


For the three-variable case, Eq. (8.4.11) becomes 


R2/2 


Y= G_Ra/n—3) 


(8.4.12) 
By virtue of the close connection between F and R?, the ANOVA Table (Table 8.1) can be recast as Table 8.4. 
For our illustrative example, using Eq. (8.4.12) we obtain: 


0.7077 /2 


z ates bole 75 
(1 — 0.7077) /61 


which is about the same as obtained before, except for the rounding errors. 

One advantage of the F test expressed in terms of R? is its ease of computation: All that one needs to know 
is the R? value. Therefore, the overall F test of significance given in Eq. (8.4.7) can be recast in terms of R? 
as shown in Table 8.4. 


Testing the Overall Significance of a Multiple Regression in Terms of R? 


Decision Rule 


Testing the overall significance of a regression in terms of R?: Alternative but equivalent test to Eq. (8.4.7). 
Given the k-variable regression model: 


Y; = Bj + b2 X2i + B3 X3i +--+ + BxXki + Ui 
To test the hypothesis 


Ho: B2 = B3 =--- = Bk = 0 
versus 
Hı: Not all slope coefficients are simultaneously zero 
compute 
R?/(k — 1) 
= ORIN- (8.4.13) 


If F > Fa(k-t,n-ky reject Ho; otherwise you may accept Hy where Fak-1,n-ġ is the critical F value at the a level of 
significance and (k — 1) numerator df and (n — k) denominator df. Alternatively, if the p value of F obtained 
from Eq. (8.4.13) is sufficiently low, reject Hp. 


eee 
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Before moving on, return to Example 7.5 in Chapter 7. From regression (7.10.7) we observe that RGDP 
(relative per capita GDP) and RGDP squared explain only about 10.92 percent of the variation in GDPG 
(GDP growth rate) in a sample of 190 countries. This R? of 0.1092 seems a “low” value. Is it really statisti- 
cally different from zero? How do we find that out? 

Recall our earlier discussion in “An Important Relationship between R% and F” about the relationship 
between R- and the F value as given in Eq. (8.4.11) or Eq. (8.4.12) for the specific case of two regressors. 
As noted, if R? is zero, then F is zero ipso facto, which will be the case if the regressors have no impact 
whatsoever on the regressand. Therefore. if we insert R? = 0.1092 into formula (8.4.12), we obtain 


0110922 
~ (1 — 0.1092) /187 


Under the null hypothesis that R* =0, the preceding F value follows the F distribution with 2 and 187 df 
in the numerator, respectively. (Note: There are 190 observations and two regressors.) From the F table we 
see that this F value is significant at about the 5 percent level: the p value is actually 0.00002. Therefore, we 
can reject the null hypothesis that the two regressors have no impact on the regressand, notwithstanding the 
fact that the R? is only 0.1092. 

This example brings out an important empirical observation that in cross-sectional data involving several 
observations, one generally obtains low R° because of the diversity of the cross-sectional units. Therefore, 
one should not be surprised or worried about finding low R*’s in cross-sectional regressions. What is relevant 
is that the model is correctly specified, that the regressors have the correct (i.e., theoretically expected) signs, 
and that (hopefully) the regression coefficients are statistically significant. The reader should check that 
individually both of the regressors in Eq. (7.10.7) are statistically significant at the 5 percent or better level 
(i.e., lower than 5 percent). 


= 11.4618 (8.4.13) 


The “Incremental” or “Marginal” Contribution of an Explanatory Variable 


In Chapter 7 we stated that generally we cannot allocate the R? value among the various regressors. In our 
child mortality example we found that the R? was 0.7077 but we cannot say what part of this value is due 
to the regressor PGNP and what part is due to female literacy rate (FLR) because of possible correlation 
between the two regressors in the sample at hand. We can shed more light on this using the analysis of 
variance technique. 

For our illustrative example we found that individually X, (PGNP) and X, (FLR) were statistically 
significant on the basis of (separate) t tests. We have also found that on the basis of the F test collectively both 
the regressors have a significant effect on the regressand Y (child mortality). 

Now suppose we introduce PGNP and FLR sequentially; that is, we first regress child mortality on PGNP 
and assess its significance and then add FLR to the model to find out whether it contributes anything (of 
course, the order in which PGNP and FLR enter can be reversed). By contribution we mean whether the 
addition of the variable to the model increases ESS (and hence R’) “significantly” in relation to the RSS. 
This contribution may appropriately be called the incremental, or marginal, contribution of an explanatory 
variable. 

The topic of incremental contribution is an important one in practice. In most empirical investigations 
the researcher may not be completely sure whether it is worth adding an X variable to the model knowing 
that several other X variables are already present in the model. One does not wish to include a variable(s) 
that contributes very little toward ESS. By the same token, one does not want to exclude a variable(s) that 
substantially increases ESS. But how does one decide whether an X variable significantly reduces RSS? The 
analysis of variance technique can be easily extended to answer this question. 
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Sourceof Variation SS df MSS 

ESS (due to PGNP) 60,449.5 1 60,449.5 
RSS 303,228.5 62 4890.7822 
Total 363,678 63 


Suppose we first regress child mortality on PGNP and obtain the following regression: 


CM, = 157.4244 — 0.0114 PGNP (8.4.14) 
t= (15.9894) (—3.5156) r2 = 0.1662 
pvalue = (0.0000) (0.0008) adj r? = 0.1528 


As these results show, PGNP has a significant effect on CM. The ANOVA table corresponding to the preceding 
regression is given in Table 8.5. 

Assuming the disturbances u; are normally distributed and the hypothesis that PGNP has no effect on CM, 
we obtain the F value of 


60,449.5 


= ——_—_ = 17.3598 8.4.15 
4890.7822 


which follows the F distribution with | and 62 df. This F value is highly significant. as the computed p value 
is 0.0008. Thus, as before, we reject the hypothesis that PGNP has no effect on CM. Incidentally, note that 
Ê = (-3.5156)? = 12.3 594, which is approximately the same as the F value of Eq. (8.4.15). where the t value 
is obtained from Eq. (8.4.14). But this should not be surprising in view of the fact that the square of the t 
statistic with n df is equal to the F value with | df in the numerator and n df in the denominator, a relationship 
first established in Chapter 5. Note that in the present example, n = 64. 

Having run the regression (8.4.14), let us suppose we decide to add FLR to the model and obtain the 
multiple regression (8.1.4). The questions we want to answer are: 


1. What is the marginal, or incremental, contribution of FLR, knowing that PGNP is already in the model 
and that it is significantly related to CM? 

2. Is the incremental contribution of FLR statistically significant? 

3. What is the criterion for adding variables to the model? 


The preceding questions can be answered by the ANOVA technique. To see this, let us construct Table 8.6. In 
this table X, refers to PGNP and X; refers to FLR. 
To assess the incremental contribution of X, after allowing for the contribution of X,, we form 


pou. Qadi 
Q4/df 


— (ESShew — ESSoia)/number of new regressors 
RSShew/df( =  — number of parameters in the new model) 


_ Om 
04/61 


for our example (8.4.16) 
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Table 8.6 ANOVA Table to Assess Incremental Contribution of a Variable(s) 


Source of Variation | SS df MSS 
ESS due to X> alone Open xs 1 Qi 
ESS due to the addition of X3 Oo = Q3 — Qı 1 a 
ESS due to both X2, X3 Q3 = Bo > yi x2; + B3 $ yi Xi 2 = 
RSS Q4 = Qs — Q3 n—3 Q4 
Total Qs =F} y? n—1 Durand 


where ESS ew = ESS under the new model (1.e., after adding the new regressors = Q3), ESS.,4 = ESS under the 
old model ( = Q,), and RSS ew = RSS under the new model (i.e., after taking into account all the regressors = 
Q,). For our illustrative example the results are as shown in Table 8.7. 


Table 8.7 ANOVA Table for the Oe se Incremental ates 


Source of Vetiation SS Sejf MSS 

ESS due to PGNP 60,449.5 1 60,449.5 
ESS due to the addition of FLR 196,912.9 1 196,912.9 
ESS due to PGNP and FLR 257,362.4 2 128,681.2 
RSS 106,315.6 61 1742.8786 
Total “ 363,678 63 


Now applying Eq. (8.4.16), we obtain: 


196791219 
1 12.9814 4.17 
1742.8786 mag 5 } 


Under the usual assumptions, this F value follows the F distribution with 1 and 62 df. The reader should 
check that this F value is highly significant, suggesting that the addition of FLR to the model significantly 
increases ESS and hence the R? value. Therefore, FLR should be added to the model. Again, note that if you 
square the t-statistic value of the FLR coefficient in the multiple regression (8.1.4), which is (-10.6293)*, you 
will obtain the F value of Eq. (8.4.17), save for the rounding errors. 

Incidentally, the F ratio of Eq. (8.4.16) can be recast by using the R? values only, as me did in Eq. (8.4.13). 
As Exercise 8.2 shows, the F ratio of Eq. (8.4.16) is equivalent to the following F ratio:” 


(Row ~ R>iq) /af 


(= Beat 
(R24, — 2,4) /number of new regressors (8.4.18) 


a (1 — R2,,,) /df( = n — number of parameters in the new model) 


P = 


This F ratio follows the F distribution with the appropriate numerator and denominator df, | and 61 in our 
illustrative example. 


The following F test is a special case of the more general F test given in Eq. (8.6.9) or Eq. (8.6.10) in Section 8.6. 
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For our example, R2.... = 0.7077 (from Eq. [8.1.4]) and Ro = 0.1662 (from Eq. [8.4.14]). Therefore, 


= (0.7077 — 0.1662)/1 
~ (1—0.7077)/61 


which is about the same as that obtained from Eq. (8.4.17), except for the rounding errors. This F is highly 
significant, reinforcing our earlier finding that the variable FLR belongs in the model. 

A cautionary note: If you use the R? version of the F test given in Eq. (8.4.11), make sure that the dependent 
variable in the new and the old models is the same. If they are different, use the F test given in Eq. (8.4.16). 


= 113.05 (8.4.19) 


When to Add a New Variable 


The F-test procedure just outlined provides a formal method of deciding whether a variable should be added 
to a regression model. Often researchers are faced with the task of choosing from several competing models 
involving the same dependent variable but with different explanatory variables. As a matter of ad hoc 
choice (because very often the theoretical foundation of the analysis is weak), these researchers frequently 
choose the model that gives the highest adjusted R’. Therefore, if the inclusion of a variable increases R?, it 
is retained in the model although it does not reduce RSS significantly in the statistical sense. The question 
then becomes: When does the adjusted R? increase? It can be shown that R? will increase if the t value of 
the coefficient of the newly added variable is larger than 1 in absolute value, where the t value is computed 
under the hypothesis that the population value of the said coefficient is zero (i.e., the t value computed from 
Eq. [5.3.2] under the hypothesis that the true B value is zero).'° The preceding criterion can also be stated 
differently: R? will increase with the addition of an extra explanatory variable only if the F( =r) value of 
that variable exceeds 1. 

Applying either criterion, the FLR variable in our child mortality example with a t value of —10.6293 
or an F value of 112.9814 should increase R?, which indeed it does—when FLR is added to the model, R? 
increases from 0.1528 to 0.6981. 


When to Add a Group of Variables 


Can we develop a similar rule for deciding whether it is worth adding (or dropping) a group of variables from 
a model? The answer should be apparent from Eq. (8.4.18): If adding (dropping) a group of variables to the 
model gives an F value greater (less) than 1, R? will increase (decrease). Of course, from Eq. (8.4.18) one can 
easily find out whether the addition (subtraction) of a group of variables significantly incteases (decreases) 
the explanatory power of a regression model. 


8.5 Testing the Equality of Two Regression Coefficients 


Suppose in the multiple regression 
Y; = Bi + BoXoj + BsX3i + PaXai tu; (8.5.1) 
we want to test the hypotheses 
Ho: B3= Bs or (f3— 4) =0 
Hi: Bs # Bs or (B3;—f4) #0 


that is, the two slope coefficients 6, and B, are equal. 


(8.5.2) 


‘For proof, see Dennis J. Aigner, Basic Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1971, pp. 91-92. 
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Such a null hypothesis is of practical importance. For example, let Eq. (8.5.1) represent the demand 
function for a commodity where Y = amount of a commodity demanded, X, = price of the commodity, X; = 
income of the consumer, and X, = wealth of the consumer. The null hypothesis in this case means that the 
income and wealth coefficients are the same. Or, if Y, and the X’s are expressed in logarithmic form, the null 
hypothesis in Eq. (8.5.2) implies that the income and wealth elasticities of consumption are the same. (Why?) 

How do we test such a null hypothesis? Under the classical assumptions, it can be shown that 

, — Ps = Ba) - (Bs - Ba) (8.5.3) 
se ($s — ĝa) p 
follows the ż distribution with (n — 4) df because Eq. (8.5.1) is a four-variable model or, more generally, with 
(n — k) df , where k is the total number of parameters estimated, including the constant term. The se ( Bere Ba) 
is obtained from the following well-known formula (see Appendix A for details): 


se (Ês — fs) = y var (Bs) + var (Bs) — 2 cov (As, Ba) (8.5.4) 


If we substitute the null hypothesis and the expression for the se (8; — Ba) into Eq. (8.5.3), our test statistic 
becomes 


ps — Bs 
5 E (8.5.5) 
y var (Ês) + var (Bs) — 2 cov (Âs, Bs) 


Now the testing procedure involves the following steps: 


Te 


1. Estimate bs and Ba. Any standard computer package can do that. 

2. Most standard computer packages routinely compute the variances and covariances of the estimated 
parameters.'' From these estimates the standard error in the denominator of Eq. (8.5.5) can be easily 
obtained. 

3. Obtain the ż ratio from Eq. (8.5.5). Note the null hypothesis in the present case is (8, — B,) = 0. 

4. If the ¢ variable computed from Eq. (8.5.5) exceeds the critical ż value at the designated level of signifi- 
cance for given df, then you can reject the null hypothesis; otherwise, you do not reject it. Alternatively, 
if the p value of the f statistic from Eq. (8.5.5) is reasonably low, one can reject the null hypothesis. Note 
that the lower the p value, the greater the evidence against the null hypothesis. Therefore, when we say 
that a p value is low or reasonably low, we mean that it is less than the significance level, such as 10, 5, 
or 1 percent. Some personal judgment is involved in this decision. 


Example 8.2 The Cubic Cost Function Revisited 


Recall the cubic total cost function estimated in Example 7.4, Section 7.10, which for convenience is repro- 
duced below: 


Ý; = 141.7667 + 63.4777X; — 12.9615X7 + 0.9396X; 
se= (6.3753) (4.7786) (0.9857) (0.0591) (7.10.6) 
cov (3, Êa) = —0.0576; R? = 0.9983 
where Y is total cost and X is output, and where the figures in parentheses are the estimated standard errors. 


"The algebraic expression for the covariance formula is rather involved. Appendix C provides a compact expression for 
it, however, using matrix notation. 
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Suppose we want to test the hypothesis that the coefficients of the X? and X? terms in the cubic cost 
function are the same, that is, 8; = B4 or (B3 — B4) = 0. In the regression (7.10.6) we have all the necessary 
output to conduct the t test of Eq. (8.5.5). The actual mechanics are as follows: 


ps — Ba 
var (As) + var (Bs) — 2 cov (Bs, Ba) 
—12.9615 — 0.9396 


TE 


a ; (8.5.6) 
(0.9867)2 + (0.0591)2 — 2(—0.0576) 

© —13.9011 

= gaz = 13-3130 


The reader can verify that for 6 df (why?) the observed t value exceeds the critical t value even at the 0.002 (or 
0.2 percent) level of significance (two-tail test); the p value is extremely small, 0.000006. Hence we can reject 
the hypothesis that the coefficients of X? and X? in the cubic cost function are identical. 


8.6 Restricted Least Squares: Testing Linear Equality Restrictions 


There are occasions where economic theory may suggest that the coefficients in a regression model satisfy 
some linear equality restrictions. For instance, consider the Cobb-Douglas production function: 


Y = Pee” (7.9.1) = (8.6.1) 
where Y = output, X, = labor input, and X; = capital input. Written in log form, the equation becomes 
In Y; = Bo + Bo In Xz; + B31n.X3; Hui (8.6.2) 


where Bp = In £. 
Now if there are constant returns to scale (equiproportional change in output for an equiproportional 
change in the inputs), economic theory would suggest that 
Bo + Bs = 1 (8.6.3) 
which is an example of a linear equality restriction. !? 


How does one find out if there are constant returns to scale, that is, if the restriction (8.6.3) is valid? There 
are two approaches. 


w 


The t-Test Approach 


The simplest procedure is to estimate Eq. (8.6.2) in the usual manner without taking into account the 
restriction (8.6.3) explicitly. This is called the unrestricted or unconstrained regression. Having estimated 
B, and B; (say, by the OLS method), a test of the hypothesis or restriction (8.6.3) can be conducted by the 
t test of Eq. (8.5.3), namely, 


_ (Ê + Bs) — (b2 + Bs) 
se (Bo + Bs) 
(Ê + Bs) —1 Coe 


var (p>) + var (Bs) + 2cov (Bo, Ê») 


t 


1 . . . . > 
?if we had B, + 8; < 1, this relation would be an example of a linear inequality restriction. To handle such restrictions, one 
needs to use mathematical programming techniques. 
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where (B, + B3) = | under the null hypothesis and where the denominator is the standard error of (Êz + 3). 
Then following Section 8.5, if the t value computed from Eq. (8.6.4) exceeds the critical ż value at the chosen 
level of significance, we reject the hypothesis of constant returns to scale: otherwise we do not reject it. 


The F-Test Approach: Restricted Least Squares 


The preceding + test is a kind of postmortem examination because we try to find out whether the linear 
restriction is satisfied after estimating the “unrestricted” regression. A direct approach would be to incor- 
porate the restriction (8.6.3) into the estimating procedure at the outset. In the present example, this procedure 
can be done easily. From (8.6.3) we see that 


h2 = 1— Bs (8.6.5) 


or 


a= 1— pr (8.6.6) 


Therefore, using either of these equalities, we can eliminate one of the 8 coefficients in Eq. (8.6.2) and 
estimate the resulting equation. Thus, if we use Eq. (8.6.5), we can write the Cobb-Douglas production 
function as 


In Y; = Bo + (1 — B3) In Xp; + B3 ln Xz; + u; 
= Bo + In Xy: + B3(In Xz; — In Xz) + ui 
or 
(In Y; — In Xz) = Bo + B3(In X3: — In Xz) + ui - (8.6.7) 
or 


In (Y; /X2:) = Bo + B3 ln (X3: /Xzi) + uj (8.6.8) 


where (Y,/X,,) = output/labor ratio and (X3,/X,,) = capital labor ratio, quantities of great economic importance. 

Notice how the original Eq (8.6.2) is transformed. Once we estimate 8, from Eq. (8.6.7) or Eq. (8.6.8), 
B- can be easily estimated from the relation (8.6.5). Needless to say, this procedure will guarantee that the 
sum of the estimated coefficients of the two inputs will equal 1. The procedure outlined in Eq. (8.6.7) or Eq. 
(8.6.8) is known as restricted least squares (RLS). This procedure can be generalized to models containing 
any number of explanatory variables and more than one linear equality restriction. The generalization can be 
found in Theil.'* (See also general F testing below.) 

How do we compare the unrestricted and restricted least-squares regressions? In other words, how do 
we know that, say, the restriction (8.6.3) is valid? This question can be answered by applying the F test as 
follows. Let 


J ûir = RSS of the unrestricted regression (8.6.2) 


Y a2, = RSS of the restricted regression (8.6.7) 


m = number of linear restrictions (1 in the present example) 
k = number of parameters in the unrestricted regression 
n = number of observations 


'3Henri Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, pp. 43-45. 
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Then, 

Pa (RSSp — RSSur)/m 
~ -RSSur/(n —&) 
(Fas 2. ûr)/m 

E ûâlr/(n—k) 


follows the F distribution with m, (n — k) df. (Note: UR and R stand for unrestricted and restricted, respectively.) 
The F test above can also be expressed in terms of R? as follows: 


(Rir — Ra) /m 
C- Ri) [n= 
where R?,, and Rj are, respectively, the R? values obtained from the unrestricted and restricted regressions, 
that is, from the regressions (8.6.2) and (8.6.7). It should be noted that 


Rep > R2 (8.6.11) 


> time > th (8.6.12) 


In Exercise 8.4 you are asked to justify these statements. 

A cautionary note: In using Eq. (8.6.10) keep in mind that if the dependent variable in the restricted and 
unrestricted models is not the same, Rij, and R4 are not directly comparable. In that case, use the procedure 
described in Chapter 7 to render the two R values comparable (see Example 8.3 below) or use the F test 
given in Eq. (8.6.9). 


(8.6.9) 


Pe (8.6.10) 


and 


Example 8.3 The Cobb-Douglas Production Function for Indian Manufacturing Sector, 2008-09 


By way of illustrating the preceding discussion, consider the data given in Table 8.8. Attempting to fit the 
Cobb-Douglas production function to these data yielded the following results. 


In Output, = 0.790 + 0.799 In Capital, + 0.254 In Labour, (8.6.13) 
t =(2.008) (9.999) (2.806) ~ 
pvaiue = (0.054) (0.000) (0.009) 


R2=0.979  RSSyg = 3.967 


where RSSyp is the unrestricted RSS, as we have put no restriction on estimating Eq. (8.6.13). 

We have already seen in Chapter 7 how to interpret the coefficients of Cobb-Douglas production function. 
As you can see, the output/labour elasticity is about 0.254 and the output/capital elasticity is about 0.799. If 
we add these coefficients, we obtain 1.053, suggesting that perhaps the Indian economy during the stated 
time period was experiencing increasing returns to scale. Of course, we do not know if 1.053 is statistically 
different from 1. 

To see if that is the case, let us impose the restriction of constant returns to scale, which gives the following 
regression: 


In (Output/Labour), = 1.347 + 0.821 In (Capital/Labour), (8.6.13) 


t = (6.131) (10.123) 
P value = (0.000) (0.000) 
R? = 0.774 RSS, = 4.355 


where RSS, is the restricted RSS, for we have imposed the restriction that there are constant returns to scale. 


Table 8.8 Output, Capital and Labour employed in Indian Manufacturing Sector, 2008-09 


States 


31 


Output 
15,254 
21,240,271 
3,676,790 
2,953,967 
352,625 
7,640,548 
6,496,541 
3,031,277 
2,747,594 
2,752,579 
50,808,787 
14,433,596 
4,227,948 
1,349,375 
5,910,751 
22,581,276 
7,090,489 
9,040,202 
60,017,352 
6,706 
192,159 
12,806 
6,953,247 
1,455,756 
10,551,352 
9,066,470 
30,080,195 
75,598 
20,046,266 
8,292,360 
14,179,648 


Capital 
11,948 
9,703,792 
1,276,381 
450,326 
110,925 
5,902,559 
1,766,781 
1,150,869 
641,588 
764,899 
22,905,355 
4,576,852 
2,977,150 
460,448 
3,555,204 
7,934,054 
1,461,807 
3,725,013 
22,829,428 
2,478 
136,232 
6,516 
5,847,911 
565,729 
3,366,691 
4,320,087 
10,677,243 
59,451 
10,280,773 
3,310,161 
5,940,658 


Source: Annual Survey of Industries 2008-09, Central Statistical Organization, Govt of India. 
Note: Output and Capital and measured in Rs. lakhs and labour is the number of workers employed. 
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Labour 
299 
909,828 
126,338 
62,864 
7,068 
126,890 
71,527 
69,038 
87,552 
40,545 
871,459 
377,322 
84,497 
45,033 
122,524 
598,070 
331,043 
202,428 
1,034,201 
2,313 
4,571 
2,468 
174,774 
39,356 
431,568 
275,950 
1,456,155 
23,643 
574,874 
172,861 
449,887 
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Since the dependent variable in the preceding two regressions is different, we have to use the F-test given 
in Eq. (8.6.9). We have the necessary data to obtain the F value. 


p= (RSSe — RSSug)/m 
RSSup/(n — k) 


_ (4.355 -3.967)/1 
~  3.967/(31- 3) 
=2.74 

Note in the present case m = 1, as we have imposed only one restriction and (n-k) is 28, since we have 31 
observations and three parameters‘in the unrestricted regression. 

This F value follows the F distribution with 1 df in the numerator and 28 df in the denominator. The reader 
can easily check that this F value is not significant at the 5% level (see Appendix D, Table D.3) 

The conclusion then is that the Indian economy was probably characterized by constant returns to scale 
over the sample period and therefore there may be no harm in using the restricted regression given in Eq. 
(8.6.14). As this regression shows, if capital/labor ratio increased by 1 percent, on average, labor productivity 
went up by about 1 percent. 


General F Testing! 


The F test given in Eq. (8.6.10) or its equivalent in Eq. (8.6.9) provides a general method of testing hypotheses 
about one or more parameters of the k-variable regression model: 


Y; = By + BoX2; + pX t- + pAn + Uj (8.6.15) 


The F test of Eq. (8.4.16) or the ¢ test of Eq. (8.5.3) is but a specific application of Eq. (8.6.10). Thus. 
hypotheses such as 


Ho: Bo = p3 à (8.6.16) 
Ho: B3 + Ba + Bs =3 (8.6.17) 

which involve some linear restrictions on the parameters of the k-variable model, or hypotheses such as 
Ho: B3 = B4 = Bs = he S0 (8.6.18) 


which imply that some regressors are absent from the model, can all be tested by the F test of Eq. (8.6.10). 

From the discussion in Sections 8.4 and 8.6, the reader will have noticed that the general strategy of F 
testing is this: There is a larger model, the unconstrained model (8.6.15), and then there is a smaller model. 
the constrained or restricted model, which is obtained from the larger model by deleting some variables from 
it, e.g., Eq. (8.6.18), or by putting some linear restrictions on one or more coefficients of the larger model. 
e.g., Eq. (8.6.16) or Eq. (8.6.17). 

We then fit the unconstrained and constrained models to the data and obtain the respective coefficients of 
determination, namely, Rijp and Rx. We note the df in the unconstrained model (= n — k) and also note the 
df in the constrained model ( = m), m being the number of linear restriction (ag. ban. Eqe[8:616)}0r Eq: 
[8.6.18]) or the number of regressors omitted from the model (e.g., m = 4 if Eq. [8.6.18] holds. since four 
regressors are assumed to be absent from the model). We then compute the F ratio as indicated in Eq. (8.6.9) 
or Eq. (8.6.10) and use this Decision Rule: If the computed F exceeds F (m,n —k), where F qin, n — k) is the 
critical F at the a level of significance, we reject the null hypothesis: otherwise we do not reject it. 


“4If one is using the maximum likelihood approach to estimation, then a test similar to the one discussed shortly is the 
likelihood ratio test, which is slightly involved and is therefore discussed in the appendix to the chapter. For further 
discussion, see Theil, op. cit., pp. 179-184. 
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Let us illustrate: 


Example 8.4 The Demand for Potatoes in India, 1992-93 


In Exercise 7.19, among other things, you were asked to consider the following demand function for potatoes: 

In Y; = By + B2X2; + B3X3; + B4X4i + BsX5i + U; (8.6.19) 
where Y = per capita consumption of potatoes in kg, X, = income per capita in thousand Rupees at 1993-94 
prices, X} = price of potatoes in rupees per kg, X, = price of cabbage in rupees per kg, and X; = price of 
cauliflower in rupees per kg. 


In this model 85, 83, B4 and B, are, respectively, the income, own-price, cross-price (cabbage), and cross 
price (cauliflower) elasticities (Why?). According to the economic theory, 


B2,>0 
B; <0 
B,>0, if potato and cabbage are competing products 
<0, if potato and cabbage are complementary products (8.6.20) 


= 0, if potato and cabbage are unrelated products 

B,>0, if potato and cauliflower are competing products 
<0, if potato and cauliflower are complementary products 
= 0, if potato and cauliflower are unrelated products 


Suppose someone maintains that potato and cabbage and cauliflower are unrelated products in the sense 
that potato consumption is not affected by the prices of cabbage and cauliflower. In short, 


Ho: B4 = Bs = 9 (8.6.21) 
Therefore, the constrained regression becomes 
J, In Y; = B, i+ B2X3; + B3X3; ate uj (8.6.22) 


Equation (8.6.19) is of course the unconstrained regression. 
Using the data given in Exercise 7.19, we obtain the following: 


Unconstrained regression: 
In¥, = 0.903 + 0.561 In X,;— 2.008 In X3;- 0.553 In X4; + 0.778 In Xs; 
(1.280) (0.405) (0.765) (0.712) (0.491) 
Rue 0-677 (8.6.23) 
Constrained regression: 
InY, = 1.379 + 0.594 In X2;— 2.139 In Xz; 

(1.233) (0.406) (0.465) 
Rip = 0.620 (8.6.24) 
where the figures in the parentheses are the estimated standard errors. Note: The R? values of Equations 


(8.6.23) and (8.6.24) are comparable since the dependent variable in the two models is the same. 
Now the F ratio to test the hypothesis of Eq. (8.6.21) is 


_ (Rog — Ra)/m 8.6.10 
~ (1—Ré,)/(n — k) > í 


The value of m in the present case is 2, since there are two restrictions involved: 64 = 0 and £; = 0. The 
denominator df, (n-k), is 15, since n = 20 and k = 5 (5 B coefficients). 


270 Basic Econometrics 


Therefore, the F ratio is 
F- (0.677 — 0.620)/2 
(1-0.677)/15 >. (8.6.25) 
Sey 


which has the F distribution with 2 and 15 df. 

At 5 percent, clearly this F value is not statistically significant [Fo s (2,15) = 3.68]. Therefore, there is no 
reason to reject the null hypothesis—the demand for potatoes does not depend on cabbage and cauliflower 
prices. In short, we can accept the constrained regression (8.6.24) as representing the demand function for 
potatoes. 

Note that the demand function satisfies a priori economic expectations in that the own-price elasticity is 
negative and that the income elasticity is positive. However, the estimated price elasticity, in absolute value, 
is statistically more than unity, implying that the demand for potato is price elastic (why?). Also, the income 
elasticity, although positive, is statistically less than unity, suggesting that potato is not a luxury item; by 
convention, an item is said to be luxury item if its income elasticity is greater than 1. 


8.7 Testing for Structural or Parameter Stability of Regression Models: 
The Chow Test 


When we use a regression model involving time series data, it may happen that there is a structural change 
in the relationship between the regress and Y and the regressors. By structural change, we mean that the 
values of the parameters of the model do not remain the same through the entire time period. Sometimes the 
structural change may be due to external forces (e.g., the oil embargoes imposed by the OPEC oil cartel in 
1973 and 1979 or the Gulf War of 1990-1991), policy changes (such as the switch from a fixed exchange-rate 
system to a more flexible exchange-rate system in 1993), actions taken by Government (e.g., the economic 
stabilization and trade liberalization policies of 1991 or changes in the minimum wage rate), or a variety of 
other causes. 

How do we find out that a structural change has in fact occurred? To be specific. consider the data given 
in Table 8.9. This table gives data on disposable personal income and gross domestic savings. in crores of 
rupees, for India for the period 1974—75 to 1995-96. Suppose we want to estimate a simple savings function 
that relates savings (Y) to disposable personal income DPI (X). Since we have the data, we can obtain an OLS 
regression of Y on X. But if we do that, we are maintaining that the relationship between sawings and DPI has 
not changed much over the span of 22 years. That may be a tall assumption. For example, India undertook 
economic stabilization and trade liberalization in 1991, when India did away with the license ray. An event 
such as this might disturb the relationship between savings and DPI. To see if this happened, let us divide 
our sample data into two time periods: 1974—75 to 1988-89 and 1989-90 to 1995-96, the pre- and post- 
liberalization, periods. 

Now we have three possible regressions: 


Time period 1974-75 to 1988-89: Y, = À; + àX, +u;  m,=15- (8.7.1) 
Time period 1989-90 to 1995-96: Y, = y, + YX, +u, m=7 l (8.7.2) 
Time period 1974-75 to 1995-96: Y, = a, +a X, +u, n=(n;+n,)=22 (8.7.3) 


Regression (8.7.3) assumes that there is no difference between the two time periods and therefore estimates 
the relationship between savings and DPI for the entire time period consisting of 22 observations. In other 
words, this regression assumes that the intercept as well as the slope coefficient remains the same over the 
entire period; that is, there is no structural change. If this is in fact the situation, then a, =A, = y; and a, = 
=Y 


Multiple Regression Analysis: The Problem of Inference 271 


Table 8.9 Savings and Personal Disposable Income (Rs. Crore), India 1974-75 to 1995-96 
ee eee Ss es —(is—s—s—sSO_C—=*s 


Year Savings income Year Savings Income 
1974-75 = 12,298 64,968 1985-86 53,389 229,527 
1975-76 14,196 69,233 1986-87 58,036 256,413 
1976-77 17320m 73,824 1987-88 72,264 291,585 
1977-78 19995 85,267 1988-89 87,166 345,011 
1978-79 23,601 91,507 1989-90 106,092 3957239 
1979-80 24,213 99,632 1990-91 130,010 465,097 
1980-81 26,881 123,067 1991-92 141,089 531,515 
1981-82 30,896 142,181 1992-93 159,682 618,587 
1982-83 33,787 157,291 1993-94 189,933 716,964 
1983-84 38,091 185,749 1994-95 247,462 842,261 
1984-85 45,453 207,491 1995-96 291,002 959,733 


CE E E L OO S O 
Source: Handbook of Statistics on Indian Economy (2009-10), Reserve Bank of India, Mumbai. 


Note: Income = Personal disposable income measured in Rupee crore in 1999-2000 prices. 
Savings = Gross domestic savings measured in Rupee crore, 1999-2000 prices 


Regressions (8.7.1) and (8.7.2) assume that the regressions in the two time periods are different: that is, the 
intercept and the slope coefficients are different, as indicated by the subscripted parameters. In the preceding 
regressions, the u's represent the error terms and the n’s represent the number of observations. 

For the data given in Table 8.9, the empirical counterparts of the preceding three regressions are as follows: 


Y, = -3004.96 + 0.25 X, 
t=(-1.86) (28.60) (8.7.1a) 
R? =0.9844 RSS, = 107092134.53 df= 13 


Ý, = -28600.51 + 0.32 X, 
t=(-2.06) (15.70) (8.7.2a) 
R?=0.9801 RSS, = 534082174.63 df=5 


Y, = -10991.35 + 0.30 X, 
t=(-4.47) (49.40) (8.7.1a) 
R? =0.9919 RSS, = 1073670901 df= 20 


In the preceding regressions, RSS denotes the residual sum of squares, and the figures in parentheses are 
the estimated ¢ values. 

A look at the estimated regressions suggests that the relationship between savings and DPI is not the 
same in the two subperiods. The slope in the preceding savings-income regressions represents the marginal 
propensity to save (MPS), that is, the (mean) change in savings as a result of a rupee increase in disposable 
personal income. In the period 1974—75 to 1988-89 the MPS was about 0.25, whereas in the period 1989-90 to 
1995-96 it was about 0.32. Whether this change was due to the economic policies pursued by the Government 
of India is hard to say. This further suggests that the pooled regression (8.7.3a)—that is, the one that pools 
all the 22 observations and runs a common regression, disregarding possible differences in the two subpe- 
riods—may not be appropriate. Of course, the preceding statements need to be supported by an appropriate 
statistical test(s). Incidentally, the scattergrams and the estimated regression lines are as shown in Figure 8.3. 
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Figure 8.3 


Now the possible differences, that is, structural changes, may be caused by differences in the intercept or 
the slope coefficient or both. How do we find that out? A visual feeling about this can be obtained as shown 
in Figure 8.3. But it would be useful to have a formal test. 

This is where the Chow test comes in handy.!> This test assumes that: 


1. uy, ~ NO, a°) and uy, ~ N(O, a°). That is, the error terms in the subperiod regressions are normally 
distributed with the same (homoscedastic) variance o. 
2. The two error terms u,, and u, are independently distributed. 


The mechanics of the Chow test are as follows: 

1. Estimate regression (8.7.3), which is appropriate if there is no season instability, and obtain RSS, 
with df = (n, + n, — k), where k is the number of parameters estimated, 2 in the present case. For our 
example RSS, = 1,07,36,70,901.40. We call RSS, the restricted residual sum of squares (RSS) because 


it is obtained by imposing the restrictions that A, = r} and A, = r3, that is, the subperiod regressions are not 
different. 


Gregory C. Chow, “Tests of Equality Between Sets of Coefficients in Two Linear Regressions,” Econometrica, vol. 28, 
no. 3, 1960, pp. 591-605. 
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2. Estimate Eq. (8.7.1) and obtain its residual sum of squares, RSS,, with df = (n,; — k). In our example, 
RSS, = 10,70,92, 134.53 and df = 13. 

3. Estimate Eq. (8.7.2) and obtain its residual sum of squares, RSS,, with df = (n, — k). In our example, 
RSS, = 53,40,82,174.63 with df = 5. 

4. Since the two sets of samples are deemed independent, we can add RSS, and RSS, to obtain what may 
be called the unrestricted residual sum of squares (RSSyp), that is, 


RSSur = RSS; + RSS? with df = (n, + nz — 2k) 


In the present case, 
RSSyp = 10,70,92,134.53 + 53,40,82,174.63 = 64,11,74,309.16 


5. Now the idea behind the Chow test is that if in fact there is no structural change (i.e., regressions 
[8.7.1] and [8.7.2] are essentially the same), then the RSSp and RSSjjp should not be statistically different. 
Therefore, if we form the following ratio: 


(RSSr — RSSur)/k 
— = ell er F oe 7. 
(RSSun)/(mi + np — 2k)” Ort 20) (8.7.4) 


then Chow has shown that under the null hypothesis the regressions (8.7.1) and (8.7.2) are (statistically) the 
same (i.e., no structural change or break) and the F ratio given above follows the F distribution with k and 
(n, +n, — 2k) df in the numerator and denominator, respectively. 

6. Therefore, we do not reject the null hypothesis of parameter stability (i.e.. no structural change) if 
the computed F value in an application does not exceed the critical F value obtained from the F table at the 
chosen level of significance (or the p value). In this case we may be justified in using the pooled (restricted?) 
regression (8.7.3). Contrarily. if the computed F value exceeds the critical F value, we reject the hypothesis 
of parameter stability and conclude that the regressions (8.7.1) and (8.7.2) are different, in which case the 
pooled regression (8.7.3) is of dubious value, to say the least. 


Returning to our example, we find that 


_ (1,07,36, 70,901.40 — 64,11,74,309.16)/2 


a 
(64,11,74,309.16)/18 (8.7.5) 


= 6.07 


From the F tables, we find that for 2 and 18 df the 5 percent critical F value is 3.55. Therefore, the probability 
of obtaining an F value of as much as or greater than 6.07 1s much smaller than 5 percent. 

The Chow test therefore seems to support our earlier hunch that the savings—income relation has undergone 
a structural change in India over the period 1974-75 to 1995-96, assuming that the assumptions underlying 
the test are fulfilled. We will have more to say about this shortly. 

Incidentaily, note that the Chow test can be easily generalized to handle cases of more than one structural 
break. For example, if we believe that the savings—income relation changed in the time period following 
1980 when the banking sector expanded in India, we could divide our sample into three periods: 1974-75 to 
1979-80, 1980-81 to 1989-90, 1990-91 to 1995-96, and carry out the Chow test. Of course, we wi!l have 
four RSS terms, one for each subperiod and one for the pooled data. But the logic of the test remains the 
same. Data through 2007 are now available to extend the last period to 2007. 

There are some caveats about the Chow test that must be kept in mind: 

1. The assumptions underlying the test must be fulfilled. For example, one should find out if the error 
variances in the regressions (8.7.1) and (8.7.2) are the same. We will discuss this point shortly. 


274 Basic Econometrics 


2. The Chow test will tell us only if the two regressions (8.7.1) and (8.7.2) are different, without telling 
us whether the difference is on account of the intercepts, or the slopes, or both. But in Chapter 9, on dummy 
variables, we will see how we can answer this question. 

3. The Chow test assumes that we know the point(s) of structural break. In our example, we assumed it to 
be in 1989-90. However, if it is i. possible to determine when the structural change actually took place, we 
may have to use other methods. !6 

Before we leave the Chow test and our savings—income regression, let us examine one of the assumptions 
underlying the Chow test, namely, that the error variances in the two periods are the same. Since we cannot 
observe the true error variances, we can obtain their estimates from the RSS given in the regressions (8.7. 1a) 
and (8.7.2a), namely, 


P OS fel OTS2 184 2 Sor TR58 aU (8.7.6) 
ie T 

g2= = = eee = 106816434.93 (8.7.7) 
Ny — 


Notice that, since there are two parameters estimated in each equation, we deduct 2 from the number of 
observations to obtain the df. Given the assumptions underlying the Chow test, 6; and ô? are unbiased 
estimators of the true variances in the two subperiods. As a result, if øf = ož, that is, the variances in the two 
subpopulations are the same (as assumed by the Chow test), then it can be shown that 


SOES 
oj 7 24) ) 
a Oe (nee (8.7.8) 
we | 
follows the F distribution with (n, — k) and (n, — k) df in the numerator and the denominator, respectively, in 


our example k = 2, anes there are only two parameters in each sub-regression. 
Of course, if o? = ož, the preceding F test reduces to computing 


(8.7.9) 


Note: By convention we put the larger of the two estimated variances in the numerator. (See Appendix A for 
the details of the F and other probability distributions.) 

Computing this F in an application and comparing it with the critical F value with the oe df, one 
can decide to reject or not reject the null hypothesis that the variances in the two subpopulations are the same. 
If the null hypothesis is not rejected, then one can use the Chow test. 

Returning to our savings—income regression, we obtain the following result: 


_ 106816434.93 
8237856.50 


Under the null hypothesis of equality of variances in the two subpopulations, this F value follows the F 
distribution with 5 and 13 df, in the numerator and denominator, respectively. (Note: We have put the larger 
of the two estimated variances in the numerator.) From the F tables in Appendix D, we see that the 5 and 1 
percent critical F values for 5 and 13 df are 3.03 and 4.86, respectively. The computed F value is significant 
both at the 5 and | percent level. Thus, our conclusion would be that the two subpopulation variances are not 
the same and, therefore, strictly speaking we should not use the Chow test. 


= 12.97 (8.7.10) 


16For a detailed discussion, see William H. Greene, Econometric Analysis, 4th ed., Prentice Hall, Englewood Cliffs, NJ, 2000, 
pp. 293-297. 
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Our purpose here has been to demonstrate the mechanics of the Chow test, which is used popularly in 
applied work. If the error variances in the two subpopulations are heteroscedastic, the Chow test can be 
modified. But the procedure is beyond the scope of this book. !7 

Another point we made earlier was that the Chow test is sensitive to the choice of the time at which the 
regression parameters might have changed. In our example, we assumed that the change probably took place 
in 1991. If we had assumed it to be 1981, when the banking sector expansion took place in India under Prime 
Minister Indira Gandhi, we might have found the computed F value to be different. As a matter of fact, in 
Exercise 8.34 the reader is asked to check this out. 

If we do not want to choose the point at which the break in the underlying relationship might have 
occurred, we could choose alternative methods, such as the recursive residual test. We will take this topic 
up in Chapter 13, the chapter on model specification analysis. 


8.8 Prediction with Multiple Regression 


In Section 5.10 we showed how the estimated two-variable regression model can be used for (1) mean 
prediction, that is, predicting the point on the population regression function (PRF), as well as for (2) 
individual prediction, that is, predicting an individual value of Y given the value of the regressor X = Xp, 
where Xo is the specified numerical value of X. 

The estimated multiple regression too can be used for similar purposes, and the procedure for doing that 
is a straightforward extension of the two-variable case, except the formulas for estimating the variances and 
standard errors of the forecast value (comparable to Eqs. [5.10.2] and [5.10.6] of the two-variable model) 
are rather involved and are better handled by the matrix methods discussed in Appendix C. Of course, most 
standard regression packages can do this routinely, so there is no need to look up the matrix formulation. It 
is given in Appendix C for the benefit of the mathematically inclined students. This appendix also gives a 
fully worked out example. 


*8.9 The Troika of Hypothesis Tests: The Likelihood Ratio (LR), 
Wald (W), and Lagrange Multiplier (LM) Tests'® 


In this and the previous chapters we have, by and large, used the f, F, and chi-square tests to test a variety of 
hypotheses in the context of linear (in-parameter) regression models. But once we go beyond the somewhat 
comfortable world of linear regression models, we need a method(s) to test hypotheses that can handle 
regression models, linear or not. 

The well-known trinity of likelihood, Wald, and Lagrange multiplier tests can accomplish this purpose. 
The interesting thing to note is that asymptotically (i.e., in large samples) all three tests are equivalent in that 
the test statistic associated with each of these tests follows the chi-square distribution. 


*Optional. 

17For a discussion of the Chow test under heteroscedasticity, see William H. Greene, Econometric Analysis, 4th ed., Prentice 
Hall, Englewood Cliffs, NJ, 2000, pp. 292-293, and Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar, U.K., 
1994, p. 51. 

18For an accessible discussion, see A. Buse, “The Likelihood Ratio, Wald and Lagrange Multiplier Tests: An Expository Note,” 
American Statistician, vol. 36, 1982, pp. 153-157. 
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Although we will discuss the likelihood ratio test in the appendix to this chapter, in general we will not 
use these tests in this textbook for the pragmatic reason that in small, or finite, samples, which is unfortunately 
what most researchers deal with, the F test that we have used so far will suffice. As Davidson and MacKinnon 
note: 


For linear regression models, with or without normal errors, there is of course no need to look at LM. W and LR at 
all, since no information is gained from doing so over and above what is already contained in F.'” 


*8.10 Testing the Functional Form of Regression: Choosing between 
Linear and Log-—Linear Regression Models 


The choice between a linear regression model (the regressand is a linear function of the regressors) or a log- 
linear regression model (the log of the regressand is a function of the logs of the regressors) is a perennial 
question in empirical analysis. We can use a test proposed by MacKinnon, White. and Davidson, which for 
brevity we call the MWD test, to choose between the two models.” 

To illustrate this test, assume the following 


Hp: Linear Model: Y is a linear function of regressors, the X’s. 
H: Log—Linear Model: In Y is a linear function of logs of regressors, the logs of X’s. 


where, as usual, H} and H, denote the null and alternative hypotheses. 
The MWD test involves the following steps:”! 


Step I: Estimate the linear model and obtain the estimated Y values. Call them Yfie., Ya 
Step: II: Estimate the log-linear model and obtain the estimated In Y values: call them In f (es In). 
Step III: Obtain Z, = (In Yf- Inf). 


Step IV: Regress Y on X’s and Z, obtained in Step III. Reject Hy if the coefficient of Z, is statistically 
significant by the usual f test. 


Step V: Obtain Z, = (antilog of In f— Yf). 


Step VI: Regress log of Y on the logs of X’s and Z». Reject H, if the coefficient of Z, is statistically 
significant by the usual ż test. 


w 


Although the MWD test seems involved, the logic of the test is quite simple. If the linear model is in fact the 
correct model, the constructed variable Z, should not be statistically significant in Step IV, for in that case the 
estimated Y values from the linear model and those estimated from the log-linear model (after taking their 
antilog values for comparative purposes) should not be different. The same comment applies to the alternative 
hypothesis H}. 


“Optional. 

19Russell Davidson and James G. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 
1993, p. 456. 

20). MacKinnon, H. White, and R. Davidson, “Tests for Model Specification in the Presence of Alternative Hypothesis; Some 
Further Results,” Journal of Econometrics, vol. 21, 1983, pp. 53-70. A similar test is proposed in A. K. Bera and C. M. Jarque, 
“Model Specification Tests: A Simultaneous Approach,” Journal of Econometrics, vol. 20, 1982, pp. 59-82. 


21This discussion is based on William H. Greene, ET. The Econometrics Toolkit Version 3, Econometric Software, Bellport, New 
York, 1992, pp. 245-246. 
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Example 8.5 The Demand for Chicken 


Refer to Exercise 7.16, where we have presented data on the demand for chicken in the states of India for 
1992-93. For illustrative purposes, we will consider the demand for chicken as a function of the prices of 
chicken and fish, leaving out the income variable for the time being. Now we consider the following models: 


Linear model: Y; = + aX >; a 3X3; ai u; (8.10.1) 
Log-linear model: Y; = Bı SF B2 In Xai at B: In X3; + ü; (8.10.2) 


where Y is the quantity of chicken in kilogram, X, is the average price of chicken in rupees per kg, and X; is 
the average price of fish in rupees per kg. A priori, a, and 8, are expected to be negative (why?), and a, and 
P3 are expected to be positive (why?). As we know, the slope coefficients in the log-linear model are elasticity 
coefficients. 
The regression results are as follows: 
Ý, = 0.4457 - 0.0095 X; + 0.0021 X3; 
t = (6.0262) (-4.5938) (0.9514) i (8.10.3) 


F=13.69 R?= 0.5903 


In ¥, = -2.9845 — 0.8626 InX,; + 0.9065 In X3; 
t =(-3.8842) (-3.0215) (1.7899) (8.10.4) 
F=7.63 R =0.4455 


As these results show, both the linear and log-linear models seem to fit the data reasonably well: The param- 
eters have the expected signs and the t and R? values are statistically significant. 

To decide between these models on the basis of the MWD test, we first test the hypothesis that the true 
model is linear. Then, following step IV of the test, we obtain the following regression: 


Ý, = 0.5349 -0.0100 X,,+0.0004 X;,- 0.0673 Z,, 
t = (8.0003) (-5.8373) (0.2259) (3.1903) (8.10.5) 
F=16.92 Rk? =0.7383 


Since the coefficient of Z} is statistically significant at 1 percent level (the p value of the estimated t is 0.0051), 
we reject the null hypothesis that the true model is linear at this level of significance. Hence a log-linear model 


better fits the sample data. 
Suppose we switch gears and assume that the true model is log-linear. Following step VI of the MWD test, 


we obtain the following regression results: 
In Ý, = -2.9698 — 0.8803 InXz; + 0.9202 InX3;- 0.3947 Z, 
T= (-3.7358) (-2.7856) (1.7429) (-0.1504) (8.10.6) 
F=4.83 R?=0.4462 


The coefficient of Z, is not statistically significant (the p value of the estimated t is 0.88). Therefore, we do not 
reject the null hypothesis that the true model is log-linear. Thus, we can use either of the hypothesis test to 
come to the same conclusion that log-linear model is a better fit for the sample data as compared to a linear 
model specification. 
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Summary and Conclusions 


l. This chapter extended and refined the ideas of interval estimation and hypothesis testing first 
introduced in Chapter 5 in the context of the two-variable linear regression model. 

2. Ina multiple regression, testing the individual significance of a partial regression coefficient (using the 
t test) and testing the overall significance of the regression (i.e., Ho: all partial slope coefficients are 
zero or R? = 0) are not the same thing. 

3. In particular, the finding that one or more partial regression coefficients are statistically insignificant 
on the basis of the individual t test does not mean that all partial regression coefficients are also 
(collectively) statistically insignificant. The latter hypothesis can be tested only by the F test. 

4. The F test is versatile in that it can test a variety of hypotheses, such as whether (1) an individual 
regression coefficient is statistically significant, (2) all partial slope coefficients are zero, (3) two 
or more coefficients are statistically equal, (4) the coefficients satisfy some linear restrictions, and 
(5) there is structural stability of the regression model. 

5. As in the two-variable case, the multiple regression model can be used for the purpose of mean and/or 
individual prediction. 


Multiple Choice Questions 


1. The assumption that the error terms in a regression model follow the normal distribution with zero 
mean and constant variance is required for : 
a. Point estimation of the parameters 
b. Hypothesis testing and inference 
c. Estimation of the regression model using OLS method 
d. Botha and b 
2. Given the regression model Y; = B, + 8X3; + 8X3; + u;, how would you state the null hypothesis to test 
that X, has no influence on Y with X, held constant. 


a. Hy: B, =0 
b. Hy: B, =9 el 
c. Ho: B;=9 


d. Hy: B, = 9 given B, =0 
3. In hypothesis testing using f statistics, when the computed f value is found to exceed the critical f value 
at the chosen level of significance, then 
a. We reject the null hypothesis 
b. We do not reject the null hypothesis 
c. It depends on alternate hypothesis 
d. It depends on F value 
4. A hypothesis such as Hp): B, = B; = 0, means that 
a. B,=0or B, =0 
b. B,=-B; 
c. B, =O and B,=0 
d. It is equivalent to testing the following individual hypothesis Ho: B- =0 and B, =0 


10. 


11. 


2 


Multiple Regression Analysis: The Problem of Inference 279 


A hypothesis such as Hy: B, = B, = 0, can be tested using 
a. t-test 
b. Chi-square test 
c. ANOVA test 
d. F-test 

In regression model Y, = B, + BX; + B4X;3; + u, testing the overall significance of the model using 

F-test, degrees of freedom used is (kK—1), (n-k), where k is equal to 


(Gh 22 
b. 3 
G al 


d. Sample size 


a. Infinity 
b. High positive value 
c. Low positive value 
d. Zero 
For Cobb-Douglas production function given as Y, = B,X $, X Be , where Y = output, X, = labour input 
and X, = capital input; the test for constant returns to scale hypothesis is stated as 
a. Hy: Bı + B, + B;=1 
b. Hy: B + B= 1 
c. Ho: Bı + B, +B; =0 
d. Hy: B, = By =B;=1 
The test used to make a choice between linear regression model and log-linear model is 
a. t-test 
b. F-test 
c. MWD test 
d. Chow test 
To test for structural break in a time series data, we use 
a. t-test 
b. F-test 
c. MWD test 
d. Chow test 
In the multiple regression model, the adjusted R? 
a. Cannot be negative 


b. Will never be greater than the regression R? 


. When R? for a regression model is equal to zero, the F value is equal to 


c. Equals the square of the correlation coefficient r 

d. Cannot decrease when an additional explanatory variable is added 
Given below are the alternative formulae for F-test statistics. We are interested in testing the statistical 
significance of the incremental contribution of X, at 5% level of significance to Y = By + B,X, + BX, + 


U. Which of these IS NOT appropriate? 
ESS TE ESS u/n 


a. 


ESS oy /(n—k) 
_ Reg- Ral 
(1— Rop)/(n—k) 
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LES: 


14. 


15. 


16. 
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_ ESS/(k —1) 
~ RSS/(n—k) 
RIE 


(1— R?)(n-k) 
Questions 13-15 are based on the following demand function for roses in India. 
In Y, = B, + Bln Xz; + B31n X3; + ByInX,; + BsInX5; +u; ` 
where Y = Demand for roses in dozens 
X, = Income per capita in thousand rupees at 1993—94 prices 
X; = Price of roses in rupees per dozen 
X, = Price of carnations in rupees per dozen 
X; = Price of lily in rupees per dozen 
If we want to test the hypothesis that own price elasticity is negative as predicted by economic theory. 
the null hypothesis we would be testing is 
a. Hy: B,=0 
b. Ho: B;>0 
c. Hy: B, <9 


d. Ho: b; = <0 


If we want to test the hypothesis that roses and carnations are complementary products. the null 
hypothesis we would be testing is 

a. Hy: B,=90 

b. Ho: By > 0 

c. Ho: By <0 


. Hy: By (2 <0 
To test the hypothesis that roses, carnations and lily are unrelated products, the null hypothesis to be 
tested is 

a. Hy: Bs +B; =9 

b. Ho: By = Bs; =0 {v 

c. Ho: B3 = By = Bs = 0 

d. Hy: B3 + B4 + Bs =9 
In testing the overall significance of the model Y, = By + B,X,; + 8X2; + u; using F-statistics we test the 
null hypothesis 


Aa 


a. Bo=B, = B,=0 
b. By=B, =0 
c. B, +B, =0 
d. B,=B,=0 


In testing the equality of two explanatory variables say X, and X, for the function given In¥; = B, + BX), 
+ B3X3; + B4X4; + u; we use the Students r statistics with degrees of freedom equal to un 
a. n-i 
b. n-2 
c. M 
d. n-k 


18. 


19) 


20. 
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In estimating a model finding F-statistics for the overall significance test to be statistically significant 
is the same as finding the r-statistics for individual explanatory variables (X's) to be statistically signifi- 
cantly different from zero. 

a. The statement is sometime TRUE 

b. The statement is always TRUE 

c. Depends on the degrees of freedom and sample size 

d. The F and t tests are different and hence are not comparable in the above manner. 
In testing the restrictions imposed on a model we calculate F-statistics and compare this value to the 
table Fim, n-p- In this formulae ‘m’ is the 

a. Number of regressors in the two models taken together 

b. Sample size of the restricted model 

c. Number of X variables dropped in the restricted model 

d. Number of parameters estimated in the restricted model 
If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then 

a. A series of t-tests may or may not give you the same conclusion 

b. The regression is always significant 

c. All of the hypotheses are always simultaneously rejected 

d. The F-statistics must be negative 


Exercises 


Questions 


8.1. 


8.2. 
8.3. 
8.4. 
8.5. 


Suppose you want to study the behavior of sales of a product, say, automobiles over a number of years 
and suppose someone suggests you try the following models: 


Y, = Bo + Bit 
Y, =a +t + apt? 


where Y, = sales at time ¢ and f = time, measured in years. The first model postulates that sales is a linear 
function of time. whereas the second model states that it is a quadratic function of time. 

a. Discuss the properties of these models. 

b. How would you decide between the two models? 

c. In what situations will the quadratic model be useful? 

d. Try to obtain data on automobile sales in the United States over the past 20 years and see which of 

the models fits tne data better. 

Show that the F ratio of Eq. (8.4.16) is equal to the F ratio of Eq. (8.4.18). (Hint: ESS/TSS = R°.) 
Show that F tests of Eq. (8.4.18) and Eq. (8.6.10) are equivalent. 
Establish statements (8.6.11) and (8.6.12). 
Consider the Cobb-Douglas production function 


Y= py L2 K® (1) 


where Y = output, L = labor input, and K = capital input. Dividing (1) through by K, we get 


(Y/K) = B(L/K)? KP +P! | (2) 
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8.6. 


8:7. 


8.8. 


8.9. 
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Taking the natural log of (2) and adding the error term, we obtain 
In(¥/K) = Bo + Bo In(L/K) + ($2 + fs — 1)ln K + ui (3) 


where Bp = In B,. 

a. Suppose you had data to run the regression (3). How would you test the hypothesis that there are 

constant returns to scale, i.e., (B, + B3) = 1? 

b. If there are constant returns to scale, how would you interpret regression (3)? 

c. Does it make any difference whether we divide (1) by L rather than by K? 
Critical values of R? when true R? = 0. Equation (8.4.11) gave the relationship between F and R? 
under the hypothesis that all partial slope coefficients are simultaneously equal to zero (i.e., R? = 0). 
Just as we can find the critical F value at the a level of significance from the F table, we can find the 
critical R? value from the following relation: 


oie (k— Wh 
(= eee 


where k is the number of parameters in the regression model including the intercept and where F is the 
critical F value at the a level of significance. If the observed R? exceeds the critical R? obtained from 
the preceding formula, we can reject the hypothesis that the true R? is zero. 

Establish the preceding formula and find out the critical R? value (at æ = 5 percent) for the regression 
(8.1.4). 

From annual data for the years 1968-1987, the following regression results were obtained: 


A 


Y, = —859.92 + 0.6470 X72; — 23.195 X3; R? = 0.9776 (1) 


A 


Y, = —261.09 + 0.2452.X>, _ R? = 0.9388 (2) 


where Y = U.S. expenditure on imported goods, billions of 1982 dollars, X, = personal disposable 
income, billions of 1982 dollars, and X, = trend variable. True or false: The standard error of X, in (1) 
is 4.2750. Show your calculations. (Hint: Use the relationship between R’, F, and t.) 

Suppose in the regression 


è ln (Y; / Xj) = &ı + a2 In Xz; + a3 In Xz; + u; ~ 


the values of the regression coefficients and their standard errors are known.” From this knowledge, 
how would you estimate the parameters and standard errors of the following regression model? 


In Y; = By + Bo In Xz; + b; ln Xz; + u; 


Assume the following: 


Y; = By + BoX2; + B3X3; + By X2;X3; + ui 


where Y is personal consumption expenditure, X, is personal income, and X, is personal wealth.’ The 
term (X7; X,;) is known as the interaction term. What is meant by this expression? How would you test 
the hypothesis that the marginal propensity to consume (MPC) (i.e., B») is independent of the wealth of 
the consumer? 


“Adapted from Peter Kennedy, A Guide to Econometrics, the MIT Press, 3d ed., Cambridge, Mass., 1992, p. 310. 
tibid., p. 327. 
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8.10. You are given the following regression results: 


f, = 16,899 = 2078 5 R? = 0.6149 
t= (8.5152)  (—4.7280) 

Y= 97e42.. —37822%, + BDA, R? = 0.7706 
t= (3.3705)  (—6.6070) (2.9712) 


Can you find out the sample size underlying these results? (Hint: Recall the relationship between R?, 
F, and t values.) 
8.11. Based on our discussion of individual and joint tests of hypothesis based, respectively, on the ż and F 
tests, which of the following situations are likely? 
1. Reject the joint null on the basis of the F statistic, but do not reject each separate null on the basis 
of the individual ż tests. 
. Reject the joint null on the basis of the F statistic, reject one individual hypothesis on the basis of 
the test, and do not reject the other individual hypotheses on the basis of the f test. 
3. Reject the joint null hypothesis on the basis of the F statistic, and reject each separate null 
hypothesis on the basis of the individual ż tests. 
4. Do not reject the joint null on the basis of the F statistic, and do not reject each separate null on the 
basis of individual t tests. 
5. Do not reject the joint null on the basis of the F statistic, reject one individual hypothesis on the 
basis of af test, and do not reject the other individual hypotheses on the basis of the f test. 
6. Do not reject the joint null on the basis of the F statistic, but reject each separate null on the basis 
of individual f tests.” 


tv 


Empirical Exercises 


8.12. Refer to Exercise 7.21. 
a. What are the real income and interest rate elasticities of real cash balances? 
b. Are the preceding elasticities statistically significant individually? 
c. Test the overall significance of the estimated regression. 
d. Is the income elasticity of demand for real cash balances significantly different from unity? 
e. Should the interest rate variable be retained in the model? Why? 
8.13. From the data for 46 states in the United States for 1992, Baltagi obtained the following regression 
results:' 


jog = 4.30 — 1.34log P+ 0.17 log Y 
se = (0.91) (0.32) (0.20) R? = 0.27 
where C = cigarette consumption, packs per year 
P = real price per pack 
Y = real disposable income per capita 
a. What is the elasticity of demand for cigarettes with respect to price? Is it statistically significant? 
If so, is it statistically different from 1? 
b. What is the income elasticity of demand for cigarettes? Is it statistically significant? If not, what 
might be the reasons for it? 
c. How would you retrieve R? from the adjusted R? given above? 


“Quoted from Ernst R. Berndt, The Practice of Econometrics: Classic and Contemporary, Addison-Wesley, Reading, Mass., 
VOD pe79: 
See Badi H. Baltagi, Econometrics, Springer-Verlag, New York, 1998, p. 111. 
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8.14. From a sample of 209 firms, Wooldridge obtained the following regression results:” 


log (salary) = 4.32 + 0.280 log (sales) + 0.0174 roe + 0.00024 ros 
se = (0.32) (0.035) (0.0041) (0.00054) 
R? = 0.283 
where salary = salary of CEO 
sales = annual firm sales 

roe = return on equity in percent 

ros = return on firm’s stock 
and where figures in the parentheses are the estimated standard errors. 
a. Interpret the preceding regression taking into account any prior expectations that you may have 

about the signs of the various coefficients. 
b. Which of the coefficients are individually statistically significant at the 5 percent level? 
c. What is the overall significance of the regression? Which test do you use? And why? 
d. Can you interpret the coefficients of roe and ros as elasticity coefficients? Why or why not? 
8.15. Assuming that Y and X,, X}, ..., X, are jointly normally distributed and assuming that the null hypothesis 

is that the population partial correlations are individually equal to zero, R. A. Fisher has shown that 


RA ES k-2 


/ 2 
emia p 


follows the ż distribution with n — k —2 df, where k is the kth-order partial correlation coefficient and 
where n is the total number of observations. (Note: r; > 3 is a first-order partial correlation coefficient. 
Fi 234 1S a second-order partial correlation coefficient, and so on.) Refer to Exercise 7.2. Assuming Y 
and X, and X; to be jointly normally distributed, compute the three partial correlations r;.3 r132 and 
ry 3 and test their significance under the hypothesis that the corresponding population correlations 
are individually equal to zero. 

8.16. In studying the demand for farm tractors in the United States for the periods 1921-1941 and 
1948-1957, Griliches’ obtained the following results: 


ÍogY, = constant — 0.519 log Xz: — 4.933 log X° R? = 0.793 
(0.231) (0.477) 


where Y, = value of stock of tractors on farms as of January 1, in 1935-1939 dollars. X, = index of 
prices paid for tractors divided by an index of prices received for all crops at time r — 1, and 
X; = interest rate prevailing in year t — 1. The estimated standard errors are given in the 
parentheses. 
a. Interpret the preceding regression. 
b. Are the estimated slope coefficients individually statistically significant? Are they significantly 
different from unity? 
c. Use the analysis of variance technique to test the significance of the overall regression. Hint: Use 
the R? variant of the ANOVA technique. 
d. How would you compute the interest-rate elasticity of demand for farm tractors? 
e. How would you test the significance of estimated R°? 


w 


“See Jeffrey M. Wooldridge, Introductory Econometrics, South-Western Publishing Co., 2000, pp. 154-155. 


tZ. Griliches, “The Demand for a Durable Input: Farm Tractors in the United States, 1921-1957,” in The Demand for Durable 
Goods, Arnold C. Harberger (ed.), The University of Chicago Press, Chicago, 1960, Table 1, p. 192. 
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8.17. Consider the following wage-determination equation for the British economy for the period 


8.18. 


Sel 9. 


8.20. 


1950-1969: 
W, = 8.582 + 0.364(PF),+ 0.004(PF),; — 2.560U, 
(1.129) (0.080) (0.072) (0.658) 


R’ = 0.873 ~ df—15 
where W = wages and salaries per employee 
PF = prices of final output at factor cost 
U = unemployment in Great Britain as a percentage of the total number of employees in Great 
Britain 
t = time 
(The figures in the parentheses are the estimated standard errors.) 
a. Interpret the preceding equation. 
b. Are the estimated coefficients individually significant? 
c. What is the rationale for the introduction of KBE) 29? 
d. Should the variable (PF),_, be dropped from the model? Why? 
e. How would you compute the elasticity of wages and salaries per employee with respect to the 
unemployment rate U? 

A variation of the wage-determination equation given in Exercise 8.17 is as follows: 


W, = 1.073 + 5.288V,— 0.116X,+ 0.054M, + 0.046M,_, 
(0.797) (0.812) (0.111) (0.022) (0.019) 


R? = 0.934 df= 14 

where W = wages and salaries per employee 

V = unfilled job vacancies in Great Britain as a percentage of the total number of employees 

in Great Britain 
X = gross domestic product per person employed 
M = import prices 
M, = import prices in the previous (or lagged) year 
(The estimated standard errors are given in the parentheses.) 
a. Interpret the preceding equation. 
b. Which of the estimated coefficients are individually statistically significant? 
c. What is the rationale for the introduction of the X variable? A priori is the sign of X expected to be 
negative? 

d. What is the purpose of introducing both M, and M,_, in the model? 
e. Which of the variables may be dropped from the model? Why? 
f. Test the overall significance of the observed regression. 
For the demand for potatoes function estimated in Eq. (8.6.24), is the estimated income elasticity 
equal to 1? Is the price elasticity equal to -1? 
For the demand function in Eq. (8.6.24) how would you test the hypothesis that the income elasticity 
is equal in value but opposite in sign to the price elasticity of demand? Show the necessary calcula- 
tions. (Note: cov [f2, 63] = —0.00142.) 


“Taken from Prices and Earnings in 1951-1969: An Econometric Assessment, Dept. of Employment, HMSO, 1971, Eq. (19), 


p. 35. 


tibid., Eq. (67), p. 37. 
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Refer to the demand for chicken function of Exercise 7.16. Confining your considerations to the 
logarithmic specification, 
a. Whatis the estimated own-price elasticity of demand (i.e., elasticity with respect to the price of chicken)? 
b. Is it statistically significant? 
c. If so, is it significantly different from unity? 
d. A priori, what are the expected signs of X, (price of fish) and X, (income)? Are the empirical 
results in accord with these expectations? 
e. If the coefficients of X} and X; are statistically insignificant, what may pe the reasons? 
Refer to Exercise 7.17 relating to wildcat activity. 
a. Is each of the estimated slope coefficients individually statistically significant at the 5 percent level? 
b. Would you reject the hypothesis that R? = 0? 
c. What is the instantaneous rate of growth of wildcat activity over the period 1948-1978? The corre- 
sponding compound rate of growth? 
Refer to the U.S. defense budget outlay regression estimated in Exercise 7.18. 
a. Comment generally on the estimated regression results. 
b. Set up the ANOVA table and test the hypothesis that all the partial slope coefficients are zero. 
The following is known as the transcendental production function (TPF). a generalization of the 
well-known Cobb-Douglas production function: 
Y; = By Lek? chsh +Bsk 


`‘ 


where Y = output, L = labor input, and K = capital input. 
After taking logarithms and adding the stochastic disturbance term, we obtain the stochastic TPF as 


In Y; = Bo + b2 1n L; + B3 ln K; + Bl; + BsK; + ui 

where Bp = In £4. ; l 

a. What are the properties of this function? 

b. For the TPF to reduce to the Cobb-Douglas production function, what must be the values of 8, and 85? 

c. If you had the data, how would you go about finding out whether the TPF reduces to the Cobb- 
Douglas production function? What testing procedure would you use? 

d. See if the TPF fits the data given in Table 8.8. Show your calculations. 

Energy prices and capital formation: United States, 1948-1978. To test the hypothesis that a rise in 

the price of energy relative to output leads to a decline in the productivity of existing capital and labor 

resources, John A. Tatom estimated the following production function for the United States for the 

quarterly period 1948-I to 1978-II:* 


in(y/) = 1.5492+ 0.7135 In(h/k)— 0.1081 In(P,/P) 


(16.33) (21.69) a (—6.42) 
+ 0.0045¢t R? = 0.98 
(15.86) 


where y = real output in the private business sector 
k = a measure of the flow of capital services 
h = person hours in the private business sector 
P, = producer price index for fuel and related products 
P = private business sector price deflator 
t = time 


“See his “Energy Prices and Capital Formation: 1972-1977,” Review, Federal Reserve Bank of St. Louis, vol. 61, no. 5 May 
1979, p. 4. i 
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The numbers in parentheses are f statistics. 

a. Do the results support the author’s hypothesis? 

b. Between 1972 and 1977 the relative price of energy, (P,/P), increased by 60 percent. From the 
estimated regression, what is the loss in productivity? 

c. After allowing for the changes in (h/k) and (P,/P), what has been the trend rate of growth of 
productivity over the sample period? 

d. How would you interpret the coefficient value of 0.7135? 

e. Does the fact that each estimated partial slope coefficient is individually statistically significant 
(why?) mean we can reject the hypothesis that R? = 0? Why or why not? 

8.26. The demand for cable. Table 8.10 gives data used by a telephone cable manufacturer to predict sales 
to a major customer for the period 1968-1983.' 
The variables in the table are defined as follows: 
Y = annual sales in MPF, million paired feet 

X, = gross national product (GNP), $, billions 

X, = housing starts, thousands of units 

X, = unemployment rate, % 

X; = prime rate lagged 6 months 

Xe = Customer line gains, % 


Table 8.10 Regression Variables 


X3, X4, Xs, Xe, Y, 
Xz, Housing Unemployment, Prime Rate Customer Line Annual 
Year GNP Starts % Lag, 6 mos. Gains, % Sales (MPF) 
1968 1051.8 1503.6 3.6 5.8 59 5873 
1969 1078.8 1486.7 3.5 6.7 4.5 7852 
1970 1075.3 1434.8 50 « 8.4 4.2 8189 
1971 1107.5 2035.6 6.0 6.2 4.2 7497 
1972 1171.1 2360.8 5.6 5.4 4.9 8534 
1973 1235.0 2043.9 4.9 5.9 5.0 8688 
1974 1217.8 1331.9 5.6 9.4 4.1 7270 
1975 1202.3 1160.0 8.5 9.4 3.4 5020 
1976 1271.0 1535.0 7.7 72 4.2 6035 
1977 1332.7 1961.8 7.0 6.6 4.5 7425 
1978 13992 2009.3 6.0 7.6 3.9 9400 
1979 1431.6 1721.9 6.0 10.6 4.4 9350 
1980 1480.7 1298.0 Z2 14.9 3.9 6540 
1981 1510.3 1100.0 7.6 16.6 Sal 7675 
1982 1492.2 1039.0 972 IS 0.6 7419 
1983 1535.4 1200.0 8.8 16.0 is 7923 


You are to consider the following model: 
Y; = Pı + b2Xz: + P3Xat + PaXat + BsXs + BoXor + ut 


a. Estimate the preceding regression. 
b. What are the expected signs of the coefficients of this model? 


tI am indebted to Daniel J. Reardon for collecting and processing the data. 
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8.28. 


c. Are the empirical results in accordance with prior expectations? 

d. Are the estimated partial regression coefficients individually statistically significant at the 5 percent 
level of significance? 

e. Suppose you first regress Y on X, X3, and X4 only and then decide to add the variables X; and X6- 
How would you find out if it is worth adding the variables X; and X6? Which test do you use? Show 
the necessary calculations. 

Marc Nerlove has estimated the following cost function for electricity generation: 


Y = AX? Pp p® p%y (1) 


where Y = total cost of production 
X = output in kilowatt hours 
P, = price of labor input 
P, = price of capital input 
P, = price of fuel 
u = disturbance term 
Theoretically, the sum of the price elasticities is expected to be unity, i.e.. (@, + @, + a3) = l. By 
imposing this restriction, the preceding cost function can be written as 


(Y/P3) = AX®(P,/P3)™ (P2/P3)?u (2) 


In other words, (1) is an unrestricted and (2) is the restricted cost function. 
On the basis of a sample of 29 medium-sized firms, and after logarithmic transformation. Nerlove 
obtained the following regression results: 


nice 4-93 + 0.941nX,+ 0.31 InP; - 


se = (1.96) (0.11) (0.23) | (3) 
—0.26 InP: + 0.44 In P3 
(0.29) (0.07) RSS = 0.336 


In(Y/P3) = —6.55 + 0.91 nX+ 0.51 In(P;/P3)+ 0.09 In (P2/P3) 
se= (0.16) (0.11) (0.19) (0.16) RSS = 0.364 
a. Interpret Eqs. (3) and (4). ed 
b. How would you find out if the restriction (a, + œ, + @3) = | is valid? Show your calculations. 
Estimating the capital asset pricing model (CAPM). In Section 6.1 we considered briefly the well- 


known capital asset pricing model of modern portfolio theory. In empirical analysis, the CAPM is 
estimated in two stages. 


(4) 


Stage I (Time-series regression). For each of the N securities included in the sample, we run the 
following regression over time: 


Ri = Ĝi + Êi R mt + en 1) 
where R, and R, are the rates of return on the ith security and on the market portfolio (say, the S&P 
500) in year f; B; as noted elsewhere, is the Beta or market volatility coefficient of the ith security, 


and e, are the residuals. In all there are N such regressions, one for each security, giving therefore N 
estimates of ß,. 


“Marc Nerlove, “Returns to Scale in Electric Supply,” in Carl Christ, ed., Measurement in Economics, Stanford University Press, 
Palo Alto, Calif., 1963. The notation has been changed. 
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Stage Il (Cross-section regression). In this stage we run the following regression over the N 
securities: 


Rj = ” aE 2B; + Uj (2) 
where R, is the average Or Mean rate of return for security i computed over the sample period covered 
by Stage I, £, is the estimated beta coefficient from the first- stage regression, and u; is the residual 
term. 

Comparing the second-stage regression (2) with the CAPM Eg. (6.1.2), written as 


ER; = ry + B(ERm — rp) (3) 


where y, is the risk-free rate of return. we see that p is an estimate of r,and ý» is an estimate of (ER, 
aps the market risk premium. 
Thus, in the empirical testing of CAPM, R, and Ê, are used as estimators of ER; and B,, respec- 
tively. Now if CAPM holds, statistically, 


m 


vA=r 
¥2 = Rm — rp, the estimator of (ER, — r 
Next consider an alternative model: 
Ri = ĵi + Bi + Pas + wi 0 
where s is the residual variance of the ith security from the first-stage regression. Then, if CAPM is 
valid, 73 should not be significantly different from zero. 


To test the CAPM, Levy ran regressions (2) and (4) on a sample ot 101 stocks for the period 
1948—1968 and obtained the following results: " 


A 


R;= 0.109 + 0.0376; 


(0.009) -- (0.008) (2) 
tof 61) R=021 
R; = 0.106 + 0.0024A; + 0.20152 
(0.008) (0.007) (0.038) (4y’ 
pee(i32) (33) (5.3) R? =0.39 


a. Are these results supportive of the CAPM? 

b. Is it worth adding the variable i to the model? How do you know? 

c. If the CAPM holds, 7; in (2)' should approximate the average value of the risk-free rate, r; The 
estimated value is 10.9 percent. Does this seem a reasonable estimate of the risk-free rate of return 
during the observation period, 1948-1968? (You may consider the rate of return on Treasury bills 
or a similar comparatively risk-free asset.) 

d. If the CAPM holds, the market risk premium (Rm — r; ) from (2)' is about 3.7 percent. If rf iS 
assumed to be 10.9 percent, this implies R,,, for the sample period was about 14.6 percent. Does 
this sound like a reasonable estimate? 

e. What can you say about the CAPM generally? 


“H. Levi “Equilibrium in an Imperfect Market: A Constraint on the Number of Securities in the Portfolio,” American Economic 
Review, vol. 68, no. 4, September 1978, pp. 643-658. 
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8.32. 


3.33: 


Refer to Exercise 7.21c. Now that you have the necessary tools, which test(s) would you use to choose 
between the two models? Show the necessary computations. Note that the dependent variables in the 
two models are different. 

Refer to Example 8.3. Use the ż test as shown in Eq. (8.6.4) to find out if there were constant returns 
to scale in the Indian manufacturing sector for the period of the study. 

Return to the child mortality example that we have discussed several times. In regression (7.6.2) we 
regressed child mortality (CM) on per capita GNP (PGNP) and female literacy rate ( FLR). Now we 
extend this model by including total fertility rate (TFR). The data on all these variables are already 
given in Table 6.4. We reproduce regression (7.6.2) and give results of the extended regression model 
below: 


1. CM; = 263.6416 — 0.0056 PGNP; — 2.2316 FLR; (7.6.2) 
se = (11.5932) (0.0019) (0.2099) KR? = 0.7077 
2. CM; = 168.3067 — 0.0055 PGNP; — 1.7680 FLR; + 12.8686TFR; 
- se = (32.8916) (0.0018) (0.2480) (?) 
R? = 0.7474 


a. How would you interpret the coefficient of TFR? A priori, would you expect a positive or negative 
relationship between CM and TFR? Justify your answer. 

b. Have the coefficient values of PGNP and FR changed between the two equations? If so. what may 
be the reason(s) for such a change? Is the observed difference statistically significant? Which test 
do you use and why? 

c. How would you choose between models 1 and 2? Which statistical test would you use to answer 
this question? Show the necessary calculations, l j 

d. We have not given the standard error of the coefficient of TFR. Can you find it out? (Hint: Recall 
the relationship between the t and F distributions.) 

Return to Exercise 1.7, which gave data on advertising impressions retained and advertising expen- 

diture for a sample of 21 firms. In Exercise 5.11 you were asked to plot these data and decide on an 

appropriate model about the relationship between impressions and advertising expenditure. Letting 

Y represent impressions retained and X the advertising expenditure, the following regressions were 

obtained: 


wv 


Modell: Y¥; = 22.163 + 0.3631X; 
se = (7.089) (0.0971) r? = 0.424 


Model II: ¥; = 7.059 + 1.0847X;— 0.0040? 


se = (9.986) (0.3699) (0.0019) R? = 0.53 


a. Interpret both models. 

b. Which is a better model? Why? 

c. Which statistical test(s) would you use to choose between the two models? 

d. Are there “diminishing returns” to advertising expenditure, that is, after a certain level of adver- 
tising expenditure (the saturation level), does it not pay to advertise? Can you find out what that 
level of expenditure might be? Show the necessary calculations. 

In regression (7.9.4), we presented the results of the Cobb—Douglas production function fitted to the 

manufacturing sector of all 50 states and Washington, DC, for 2005. On the basis of that regression, 

find out if there are constant returns to scale in that sector, using 


8.34. 


3.35. 


8.36. 
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a. The t test given in Eq. (8.6.4). You are told that the covariance between the two slope estimators is 
0.03843. 

b. The F test given in Eq. (8.6.9). 

c. Is there is a difference in the two test results? And what is your conclusion regarding the returns to 
scale in the manufacturing sector of the 50 states and Washington, DC, over the sample period? 
Reconsider the savings—income regression in Section 8.7. Suppose we divide the sample into two 
periods as 1974-75 to 1989-90 and 1990-91 to 1995-96. Using the Chow test, decide if there is a 
structural change in the savings~income regression in the two periods. Comparing your results with 
those given in Section 8.7, what overall conclusion do you draw about the sensitivity of the Chow test 

to the choice of the break point that divides the sample into two (or more) periods? 

Refer to Exercise 7.24 and the data in Table 7.12 concerning four economic variables in the U.S. from 

1947-2000. 

a. Based on the regression of consumption expenditure on real income, real wealth and real interest 
rate, find out which of the regression coefficients are individually statistically significant at the 5 
percent level of significance. Are the signs of the estimated coefficients in accord with economic 
theory? 

b. Based on the results in (a), how would you estimate the income, wealth, and interest rate 
elasticities? What additional information, if any, do you need to compute the elasticities? 

c. How would you test the hypothesis that the income and wealth elasticities are the same? Show the 
necessary calculations. 

d. Suppose instead of the linear consumption function estimated in (a), you regress the logarithm of 
consumption expenditure on the logarithms of income and wealth and the interest rate. Show the 
regression results. How would you interpret the results? 

e. What are the income and wealth elasticities estimated in (d)? How would you interpret the coeffi- 
cient of the interest rate estimated in (d)? 

J. In the regression in (d) could you have used the logarithm of the interest rate instead of the interest 

rate? Why or why not? 

. How would you compare the elasticities estimated in (b) and in (d)? 

. Between the regression models estimated in (a) and (d), which would you prefer? Why? 

i. Suppose instead of estimating the model given in (d), you only regress the logarithm of consumption 
expenditure on the logarithm of income. How would you decide if it is worth adding the logarithm 
of wealth in the model? And how would you decide if it is worth adding both the logarithm of 
wealth and interest rate variables in the model? Show the necessary calculations. 

Refer to Section 8.7 and the data in Table 8.9 concerning disposable personal income and personal 

savings for the period 1974-75 to 1995-96. In that section, the Chow test was introduced to see if 

a structural change occurred within the data between two time periods. Table 8.11 includes updated 

data containing the values from 1974—75 to 2004—05. Split the data into three sections: (1) 1974-75 

to 1988-89, (2) 1989-90 to 1996-97, and (3) 1997-98 to 2004—05, with the last section representing 

the boom in the information technology and BPO sector. 

a. Estimate both the model for the full dataset (years 1974-75 to 2004—05) and the third section 
(post-1997). Using the Chow test, determine if there is a significant break between the third period 
and the full dataset. 

b. With this new data in Table 8.11, determine if there is still a significant difference between the 
first set of years (1974-75 to 1988-89) and the full dataset, now that there are more observations 
available. 

c. Perform the Chow test on the middle period (1989-90 to 1996-97) versus the full dataset to see if 
the data in this period behave significantly differently than the rest of the data. 


n a 9 


292 Basic Econometrics 


Table 8.11 Savings and Personal Disposable Income, India, 1974-75 to 2004-05 
a aU a 


Year Savings Income Year Savings Income 
1974-75 12,298 64,968 1990-91 130,010 465,097 
1975-76 14,196 69,233 1991-92 141,089 S511515 
1976-77 17,320 73,824 1992-93 159,682 618,587 
1977-78 197995 85,267 - 1993-94 189,933 — 716,964 
1978-79 23,601 . 91,507 1994-95 247,462 842,261 
1979-80 24,213: 99,632 1995-96 291,002 959/733 
1980-81 26,881 123,067 1996-97 313,068 1,145,206 
1981-82 30,896 142,181 1997-98 363,506 1,263,982 
1982-83 esos 157,291 1998-99 389,747 1,474,404 
1983-84 38,091 185,749 1999-00 484,256 1,617,965 
1984-85 45,453 207,491 2000-01 499,033 1,773,250 
1985-86 53,389 229,527 2001-02 534,885 1,954,839 
1986-87 58,036 256,413 2002-03 646,521 2,064,839 
1987-88 72,264 291,585 2003-04 820,685 2,282,148 
1988-89 87,166 345,011 2004-05 997,873 2,495,015 
1989-90 106,092 395,239 


Source: Handbook of Statistics on Indian Economy (2009-10), Reserve Bank of India, Mumbai 
Note: Income = Personal disposable income measured in Rupee crore in 1999-2000 prices 
Savings = Gross domestic savings measured in Rupee crore, 1999-2000 prices 


Key to Multiple Choice Questions 


1. (b) 2. (b) 3. (a) 4. (c) 5. (d) 6. (c) 7. (d) 8. ~b) DEC) 
10. (d) 11. (b) 12. (a) 13. (c) 14. (b) 15. (b) 16. (a) 17. (b) 18. (d) 
19. (c) 20.{c) 


*Appendix 8A2 
Likelihood Ratio (LR) Test 


The LR test is based on the maximum likelihood (ML) principle discussed in Appendix 4A. where we showed how 
one obtains the ML estimators of the two-variable regression model. The principle can be straightforwardly extended to 
the multiple regression model. Under the assumption that the disturbances ui are normally distributed. we showed that. 
for the two-variable regression model, the OLS and ML estimators of the regression coefficients are identical, but the 
estimated error variances are different. The OLS estimator of }° ii? /(n — 2) but the ML estimator is 7 ù? /n. the former 
being unbiased and the latter biased, although in large samples the bias tends to disappear. 


*Optional. 
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The same is true in the multiple regression case. To illustrate, consider the three-variable regression model: 
Yi = Bi + BoX2; + B3X3; + ui (1) 
Corresponding to Eq. (5) of Appendix 4A, the log-likelihood function for the model (1) can be written as: 


l 
InLF = = In (0?) — in(27) — <5 (Y — Bi — BrX2i — Ba Xai Y (2) 


As shown in Appendix 4A. differentiating this function with respect to B,, B», By, and a”, setting the resulting expres- 
sions to zero, and solving, we obtain the ML estimators of these estimators. The ML estimators of B,. By, and £, will be 
identical to OLS estimators, which are already given in Eqs. (7.4.6) to (7.4.8), but the error variance will be different in 
that the residual sum of squares (RSS) will be divided by n rather than by (7 — 3), as in the case of OLS. 

Now let us suppose that our null hypothesis Hy is that 83. the coefficient of X;, is zero. In this case, log LF given in 
(2) will become 


1 
In LF = = in (o?) — 5 In(2m) — => Dh — Bi oa (3) 


Equation (3) is known as the restricted log-likelihood function (RLLF) because it is estimated with the restriction 
that a priori B, is zero, whereas Eq. (1) is known as the unrestricted log LF (ULLF) because a priori there are no restric- 
tions put on the parameters. To test the validity of the a priori restriction that 8; is zero, the LR test obtains the following 
test statistic: 


à = 2(ULLF — RLLF) (4)" 
where ULLF and RLLF are. respectively, the unrestricted log-likelihood function (Eq. [2]) and the restricted log-likelihood 
function (Eq. [3]). If the sample size is large. it can be shown that the test statistic A given in Eq. (4) follows the chi-square 
(*) distribution with df equal to the number of restrictions imposed by the null hypothesis, 1 in the present case. 

The basic idea behind the LR test is simple: If the a priori restriction(s) is valid, the restricted and unrestricted (log) LF 
should not be different, in which case A in Eq. (4) will be zero. But if that is not the case, the two LFs will diverge. And 
since in a large sample we know that A follows the chi-square distribution, we can find out if the divergence is statistically 
significant, say. at a 1 or 5 percent level of significance. Or else, we can find out the p value of the estimated À. 

Let us illustrate the LR test with our child mortality example. If we regress child mortality (CM) on per capita GNP 
(PGNP) and female literacy rate (FLR) as we did in Eq. (8.1.4), we obtain ULLF of —328.1012, but if we regress CM on 
PGNP only. we obtain the RLLF of -361.6396. In absolute value (i.e., disregarding the sign), the former is smaller than 
the latter, which makes sense since we have an additional variable in the former model. 

The question now is whether it is worth adding the FLR variable. If it is not, the restricted and unrestricted LLF should 
not differ much, but if it is, the LLFs will be different. To see if this difference is statistically significant, we now use the 
LR test given in Eq. (4), which gives: 


A = 2[-—328.1012 — (—361.6396)] = 67.0768 


Asymptotically, this is distributed as the chi-square distribution with 1 df (because we have only one restriction 
imposed when we omitted the FLR variable from the full model). The p value of obtaining such a chi-square value for 1 
df is almost zero, leading to the conclusion that the FLR variable should nor be excluded from the model. In other words, 
the restricted regression in the present instance is not valid. 

Letting RRSS and URSS denote the restricted and unrestricted residual sums of squares, Eq. (4) can also be expressed 


as: 
—2 1n à = n(InRRSS — In URSS) (5) 


which is distributed as y? with r degrees of freedom. where r is the number of restrictions imposed on the model (i.e., the 
number of r coefficients omitted from the original model). 


*This expression can also be expressed as —2(RLLF — ULLF) or as -2 In (RLF/ULF) 
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Although we will not go into the details of the Wald and LM tests, these tests can be implemented as follows: 


ee (n — k)(RRSS — URSS) 2 
A U ee eee 6 
Wald Statistic (W) E x; (6) 
—k RRSS — URSS 
Lagrange Multiplier Statistic (LM) = (it FORES SU ss x? (7) 


RRSS á 


where k is the number of regressors in the unrestricted model and r is the number of restrictions. 

As you can see from the preceding equations, all three tests are asymptotically (i.e., in large samples) equivalent, that 
is, they give similar answers. However, in small samples the answers can differ. There is an interesting relationship among 
these statistics in that it can be shown that: 


W>LR>LM 


Therefore, in small samples, a hypothesis can be rejected by the Wald statistic but not rejected by the LM statistic.” 

As noted in the text, for most of our purposes the t and F tests will suffice. But the three tests discussed above are of 
general applicability in that they can be applied to testing nonlinear hypotheses in linear models. or testing restrictions 
on variance-covariance matrices. They also can be applied in situations where the assumption that the errors are normally 
distributed is not tenable. 

Because of the mathematical complexity of the Wald and LM tests, we will not go into more detail here. But as noted, 
asymptotically, the LR, Wald, and LM tests give identical answers, the choice of the test depending on computational 
convenience. 


*For an explanation, see G.S. Maddala, Introduction to Econometrics, 3d ed., John Wiley & Sons, New York, 2001, p. 177. 


CHAPTER 


Dummy Variable 
Regression Models 


In Chapter 1 we discussed briefly the four types of variables that one generally encounters in empirical 
analysis: These are: ratio scale, interval scale, ordinal scale, and nominal scale. The types of variables 
that we have encountered in the preceding chapters were essentially ratio scale. But this should not give the 
impression that regression models can deal only with ratio scale variables. Regression models can also handle 
other types of variables mentioned previously. In this chapter, we consider models that may involve not only 
ratio scale variables but also nominal scale variables. Such variables are also known as indicator variables, 
categorical variables, qualitative variables, or dummy variables. ! 


9.1 The Nature of Dummy Variables 


In regression analysis the dependent variable, or regressand, is frequently influenced not only by ratio scale 
variables (e.g., income, output, prices, costs, height, temperature) but also by variables that are essentially 
qualitative, or nominal scale, in nature, such as sex, race, color, religion, nationality, geographical region, 
political upheavals, and party affiliation. For example, holding all other factors constant, female workers are 
found to earn less than their male counterparts or nonwhite workers are found to earn less than whites.” This 
pattern may result from sex or racial discrimination, but whatever the reason, qualitative variables such as sex 
and race seem to influence the regressand and clearly should be included among the explanatory variables, 
or the regressors. 

Since such variables usually indicate the presence or absence of a “quality” or an attribute, such as male 
or female, black or white, Catholic or non-Catholic, Democrat or Republican, they are essentially nominal 
scale variables. One way we could “quantify” such attributes is by constructing artificial variables that take 
on values of 1 or 0, 1 indicating the presence (or possession) of that attribute and 0 indicating the absence of 
that attribute. For example, 1 may indicate that a person is a female and 0 may designate a male; or | may 
indicate that a person is a college graduate, and 0 that the person is not, and so on. 


'We will discuss ordinal scale variables in Chapter 15. 
2For a review of the evidence on this subject, see Bruce E. Kaufman and Julie L. Hotchkiss, The Economics of Labor Markets, 
5th ed., Dryden Press, New York, 2000. 
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Variables that assume such 0 and 1 values are called dummy variables.’ Such variables are thus essen- 
tially a device to classify data into mutually exclusive categories such as male or female. 

Dummy variables can be incorporated in regression models just as easily as quantitative variables. As a 
matter of fact, a regression model may contain regressors that are all exclusively dummy, or qualitative, in 
nature. Such models are called Analysis of Variance (ANOVA) models.” 


9.2 ANOVA Models 


To illustrate the ANOVA models, consider the following example. 


Example 9.1 Consumption Expenditure by Geographical Region 


Table 9.1 gives data on average per-capita consumption expenditure (in rupees) for 17 states in India for the 
year 2006-07. The data pertains to consumption expenditure per person per 30 days. These 17 states are 
classified into three geographical regions: 1. East (6 states), 2. North-west-central (7 states), and 3. south 
(4 states). For the time being, do not worry about the format of the table and the other data given in the table. 

Suppose we want to find out if the average per-capita consumption expenditure (PCE) differs among 
the three geographical regions of the country. If you take the simple arithmetic average of the average PCE 
in the three regions, you will find that these averages for the three regions are as follows: Rs. 856.33 (East), 
Rs. 1067.29 (North-west-central), and Rs. 1097.38 (South). These numbers look different, but are they statis- 
tically different from one another? There are various statistical techniques to compare two or more mean 
values, which generally go by the name of analysis of variance.’ But the same objective can be accom- 
plished within the framework of regression analysis. 

To see this consider the following model: 


Y= By + B2Dz; + B3D3;= U; (9.2.1) 
where Y; = Average consumption expenditure (Rs.) per person per 30 days in state i 
Dz; = 1 if the state is in the Eastern region of India 
= 0 otherwise (i.e., in other region of the country) 
D3; = 1 if the state is in the north-west-central region of the country 
= 0 otherwise (i.e., in other region of the country) 


Note that Eq. (9.2.1) is like any multiple regression model considered previously, except that, instead of 
quantitative regressors, we have only qualitative, or dummy, regressors, taking the value of 1 if the obser- 
vation belongs to a particular category and 0 if it does not belong to that category or group. Hereafter, we shall 
designate all dummy variables by letter D. Table 9.1 shows the dummy variables thus constructed. 

What does the model (9.2.1) tell us? Assuming that the error term satisfies the usual OLS assumptions, on 
taking expectation of Eq. (9.2.1) on both sides, we obtain: 


‘It is not absolutely essential that dummy variables take the values of 0 and 1. The pair (0,1) can be transformed into any 
other pair by a linear function such that Z = a + bD (b # 0), where a and b are constants and where D = 1 or 0. When D = 
1, we have Z = a + b, and when D = 0, we have Z = a. Thus the pair (0, 1) becomes (a, a + b). For example, if a= 1 and b= 
2, the dummy variables will be (1, 3). This expression shows that qualitative, or dummy, variables do not have a natural scale 
of measurement. That is why they are described as nominal scale variables. 


4ANOVA models are used to assess the statistical significance of the relationship between a quantitative regressand and 
qualitative or dummy regressors. They are often used to compare the differences in the mean values of two or more groups 


or categories, and are therefore more general than the t test, which can be used to compare the means of two groups or 
categories only. 


>For an applied treatment, see John Fox, Applied Regression Analysis, Linear Models, and Related Methods, Sage Publications, 
1997, Chapter 8. 
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Mean per capita consumption expenditure in the Eastern region: 


F(Y,i4 | D,2i=1, D,3i= 0) = B,1+ B,2 (9.2.2) 
Mean per capita consumption expenditure in the north-west-central region: 
Eie = 0/0 3i= 1) = pe (9.2.3) 


You might wonder how we find out the mean per capita consumption expenditure for the southern region. 
If you guessed that this is equal to B,, you would be absolutely right, for 
Mean per capita consumption expenditure in the southern region: 

E(Y,i4 | D,2i=0, D,3i=0) = 8,1 (9.2.4) 
In other words, the mean per capita consumption expenditure in the south is given by the intercept, Bj, 
in the multiple regression (9.2.1), and the “slope” coefficients B, and £; tell by how much the mean per 
capita consumption expenditure in the east and in the north-west-central differ from the mean per-capita 
consumption expenditure in the south. But how do we know if these differences are statistically significant? 
Before we answer this question, let us present the results based on the regression (9.2.1). Using the data given 
in Table 9.1, we obtain the following results: 


a 


Ý. = 1097.38 - 241.04 D,;- 30.09 D3; (9.2.5) 
se = (103.31) (133.37) (129.50) 
t = (10.62) (-1.81) (-0.23) 
(0.00)* (0.09)* (0.82)* 


where * indicates the p values. 


Table 9.1 Average per capita consumption expenditure by state, 2006—07 


States Per capita consumption Household Size D2 D3 
expenditure 


Andhra Pradesh 1044.0 3.8 0 0 
Assam 1045.0 — 4.5 1 0 
Bihar 703.0 5.2 1 0 
Chhattisgarh 788.0 4.9 1 0 
Gujarat 1109.5 4.9 0 1 
Haryana 1174.5 5.1 0 1 
Jharkhand 836.0 5.1 1 0 
Karnataka 902.0 4.4 0 0 
Kerala 1465.5 4.0 0 0 
Madhya Pradesh 758.5 5.3 0 1 
Maharashtra 1224.5 4.4 0 1 
Orissa 765.5 4.4 1 0 
Punjab 1403.5 4.5 0 1 
Rajasthan 976.0 52 0 1 
Tamil Nadu 978.0 3.7 0 0 
Uttar Pradesh 824.5 5.3 0 1 
West Bengal 1000.5 4.2 1 0 


nnn EEE 


Source: “Household Consumer Expenditure in India—2006-07”, 2008 National Sample Survey Organization, Govt. of India 
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As these regression results show, the mean per capita consumption expenditure in the south is about 
Rs. 1097.38, in the eastern region the per capita consumption is lower by about Rs. 241.04 and that in the 
north-west-central region it is lower by about Rs.30.09. The actual mean per capita consumption expenditure 
in the last two regions can be easily obtained by adding these differential per capita consumptions to the 
mean per capita consumption in the south, as shown in Eqs. (9.2.3) and (9.2.4). Doing this, we will find that 
the mean per capita consumption in the latter two regions are about Rs. 856.33 and Rs. 1067.29. 

But how do we know that these mean consumption levels are statistically different from the mean per 
capita consumption in the southern region, the comparison category? That is easy enough. All we have to 
do is to find out if each of the “slope” coefficients in Eq. (9.2.5) is statistically significant. As can be seen from 
the regression, the estimated slope coefficient for eastern region is statistically significant at 9 percent level 
of significance, as its p value is 9 percent. But if we take the standard 1 and 5 percent level of significance 
then this slope coefficient is not statistically significant at these levels. The slope coefficient for north-west- 
central region is also not statistically significant, as the p value is 82 percent. Therefore the overall conclusion 
is that statistically the mean per capita consumption expenditure in the eastern region, the north-west-central 
region, and the south are about the same. Diagrammatically, the situation is shown in Fig. 9.1. 


[1067.29 (B; + Bo) 


B, = Rs. 2097.38 


Rs. 856.33 (Ê; + 2) 


South North-west-central East 


Fig. 9.1 Average per capita consumption expenditure (in rupees) in three regions {v 


A caution is in order in interpreting these differences. The dummy variables will simply point out the differ- 
ences, if they exist, but they do not suggest the reasons for the differences. Differences in household size, 
family income, inflation rate and wealth of the family may all have some effect on the observed differences. 
Therefore, unless we take into account all the other variables that may affect per capita consumption expen- 
diture, we will not be able to pin down the cause(s) of the differences. 

From the preceding discussion, it is clear that all one has to do is see if the coefficients attached to the 
various dummy variables are individually statistically significant. This example also shows how easy it is to 
incorporate qualitative, or dummy, regressors in the regression models. 


Caution in the Use of Dummy Variables 


Although they are easy to incorporate in the regression models, one must use the dummy variables carefully. 
In particular, consider the following aspects: 
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1. In Example 9.1, to distinguish the three regions. we used only two dummy variables, D, and D;. Why 
did we not use three dummies to distinguish the three regions? Suppose we do that and write the model 
(9'21) as: 


Yi = æ + pı Dy; + By, D2; + B3D3; + uj (9.2.6) 
where D,; takes a value of 1 for states in the South and 0 otherwise. Thus, we now have a dummy variable for 
each of the three geographical regions. Using the data in Table 9.1, if you were to run the regression (9.2.6), 
the computer would “refuse” to run the regression (try it).° Why? The reason is that in the setup of Eq. (9.2.6) 
where you have a dummy variable for each category or group and also an intercept, you have a case of perfect 
collinearity, that is, exact linear relationships among the variables. Why? Refer to Table 9.1. Imagine that 
now we add the D, column, taking the value of 1 whenever a state is in the South and 0 otherwise. Now if 
you add the three D columns horizontally, you will obtain a column that has 17 ones in it. But since the value 
of the intercept æ is (implicitly) 1 for each observation, you will have a column that also contains 17 ones. 
In other words, the sum of the three D columns will simply reproduce the intercept column, thus leading to 
perfect collinearity. In this case, estimation of the model (9.2.6) is impossible. 

The message here is: If a qualitative variable has m categories, introduce only (m — 1) dummy 
variables. In our example. since the qualitative variable “region” has three categories, we introduced only 
two dummies. If you do not follow this rule, you will fall into what is called the dummy variable trap, that 
is, the situation of perfect collinearity or perfect multicollinearity, if there is more than one exact relationship 
among the variables. This rule also applies if we have more than one qualitative variable in the model, an 
example of which is presented later. Thus we should restate the preceding rule as: For each qualitative 
regressor, the number of dummy variables introduced must be one less than the categories of that 
variable. Thus, if in Example 9.1 we had information about the whether the person surveyed resided in rural 
or urban area, we would use an additional dummy variable (but not two) taking a value of 1 for urban and 0 
for rural or vice versa. 

2. The category for which no dummy variable is assigned is known as the base, benchmark, control, 
comparison, reference, or omitted category. And all comparisons are made in relation to the benchmark 
category. 

3. The intercept value (8,) represents the mean value of the benchmark category. In Example 9.1, the 
benchmark category is the Western region. Hence, in the regression (9.2.5) the intercept value of about 
1097.38 represents the per capita consummation expenditure in the Southern states. 

4. The coefficients attached to the dummy variables in Eq. (9.2.1) are known as the differential intercept 
coefficients because they tell by how much the value of the category that receives the value of 1 differs from 
the intercept coefficient of the benchmark category. For example, in Eq. (9.2.5), the value of about 241.04 
tells us that the mean per capita consumption expenditure in the East is lower by about Rs.856 than the mean 
per capita consumption expenditure of about Rs. 1097 for the benchmark category, the South. 

5. If a qualitative variable has more than one category, as in our illustrative example, the choice of the 
benchmark category is strictly up to the researcher. Sometimes the choice of the benchmark is dictated by 
the particular problem at hand. In our illustrative example, we could have chosen the South East as the 
benchmark category. In that case the regression results given in Eq. (9.2.5) will change, because now all 
comparisons are made in relation to the South East. Of course, this will not change the overall conclusion 
of our example (why?). In this case, the intercept value will be about Rs. 856, which is the mean per capita 
consumption expediture in the East. 


Actually you will get a message saying that the data matrix is singular. 
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6. We warned above about the dummy variable trap. There is a way to circumvent this trap by introducing 
as many dummy variables as the number of categories of that variable, provided we do not introduce the 
intercept in such a model. Thus, if we drop the intercept term from Eq. (9.2.6), and consider the following 
model, 


Y; = Bi Di; + B2D2; + BsD3i + ui (9.2.7) 


we do not fall into the dummy variable trap, as there is no longer perfect collinearity. But make sure that when 
you run this regression, you use the no-intercept option in your regression package. 

How do we interpret regression. (9.2.7)? If you take the expectation of Eq. (9.2.7), you will find that: 

Bı = mean per capita consumption expenditure in the South 

>= mean per capita consumption expenditure in the East 

B, = mean per capita consumption expenditure in the North-West-Central region 

In other words, with the intercept suppressed, and allowing a dummy variable for each category, we obtain 
directly the mean values of the various categories. The results of Eq. (9.2.7) for our illustrative example are 
as follows: 


Ê; = 1097.38D,; + 356.33D,; + 1067.29; 


se = (103.31) (84.35) (78.09) (9.2.8) 
t = (10.62) (10.15) (13.67) 
R? = 0.044 


where’ indicates that the p values of these f ratios are very small. 

As you can see, the dummy coefficients give directly the mean (per capita consumption expenditure) 
values in the three regions? South, East, and North-West-Central. 

7. Which is a better method of introducing a dummy variable: (1) introduce a dummy for each category 
and omit the intercept term or (2) include the intercept term and introduce only (m — 1) dummies, where m is 
the number of categories of the dummy variable? As Kennedy notes: 


Most researchers find the equation with an intercept more convenient because it allows them to address more easily 
the questions in which they usually have the most interest, namely, whether or not the categorization makes a 
difference, and if so, by how much. If the categorization does make a difference, by how much is measured directly 
by the dummy variable coefficient estimates. Testing whether or not the categorization is relevant can be done by 
running af test of a dummy variable coefficient against zero (or, to be more general, an F test onthe appropriate 
set of dummy variable coefficient estimates). 


9.3 ANOVA Models with Two Qualitative Variables 


In the previous section we considered an ANOVA model with one qualitative variable with three categories. 
In this section we consider another ANOVA model, but with two qualitative variables, and bring out some 
additional points about dummy variables. 


Example 9.2: Literacy Rate in Relation to Gender and Area of Residence 


Table 9.2 gives data on literacy rate for population above 7 years across 19 states for the period 2006-07. The 
data is classified based on gender (Male and Female) and area of residence (Urban and Rural). In this example 
we have two qualitative regressors, each with two categories. Hence, we assign a single dummy variable for 
each category as follows: 


7Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p. 223. 
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Table 9.2: Literacy rate (in %) across states in India, 2006-07 
O O a ë eee ee 


State Urban Rural Per Capita Net State 
Male Female Male Female ea ai 
S. 
Andhra Pradesh 85 l 70 63 42 30439 
Assam . 98 90 87 73 20194 
Bihar 86 70 65 38 9796 
Chhattisgarh 88 76 79 54 24556 
Gujarat 93 80 78 55 39459 
Haryana 86 71 83 58 50611 
Himachal Pradesh 96 88 89 73 . 36766 
Jammu & Kashmir 87 67 75 56 22426 
Jharkhand . 92 80 71 45 18474 
Karnataka 88 75 73 53 31713 
Kerala 97 92 96 89 37947 
Madhya Pradesh 87 76 72 49 16875 
Maharastra 94 85 83 63 41144 
Punjab 85 77 75 ~ 59 39874 
Rajasthan 85 64 73 39 21203 
Tamil Nadu 94 81 81 61 37190 
Tripura 89 86 84 70 27816 
Uttar Pradesh 82 69 76 49 14663 
West Bengal 90 80 80 62 27905 


Source: “Household Consumer Expenditure in India—2006-07”. 2008 National Sample Survey Organization, Govt. of India and Handbook of Statistics 
on Indian Economy—2009-]0, RBI, Mumbai. 


Y; = Bı + B2D2; + B3D3;+ u; (9.3.1) 
where Y;= literacy rate (percent) 
Dz; = Gender; 1 = Female, 0 = otherwise 
D3; = area of residence; 1 = Urban, 0 = otherwise 
Using the data given in Table 9.2, we obtain the following results: 


Y; = 75.82 - 16.32 D>; + 16.00 D3; 
se= (1.82) (2.10) (2.10) 
t= (50.52) (-7.77) (7.62) (9.3.2) 
(0.00)* (0.00)*  (0.00)* 
where * denotes the p values. 

In this regression, which category is the benchmark category? Obviously, it is male, rural. In other words, 
male who do not live in the urban area are the omitted category. Therefore, all comparisons are made in 
relation to this group. The mean literacy rate in this benchmark is about 75.82 percent. Compared with this, 
the average literacy rate for female is lower by about 16.32 percent, for an actual average literacy rate of 59.50 
percent (75.82 — 16.32). By contrast for those who live in the urban area the mean literacy rate is higher by 
about 16 percent, for an actual average literacy rate of 91.82 percent (75.82 + 16). 

Are the preceding average literacy rates statistically different compared to the base category? They are, for 
all the differential intercepts are statistically significant, as their p values are quite low. 

The point to note about this example is as follows: Once you go beyond one qualitative variable, you have 
to pay close attention to the category that is treated as the base category, since all comparison are made in 
relation to that category. This is especially important when you have several qualitative regressors, each with several 
categories. But the mechanics of introducing several qualitative variables should be clear by now. 
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9.4 Regression with a Mixture of Quantitative and 
Qualitative Regressors: The ANCOVA Models 


ANOVA models of the type discussed in the preceding two sections, although common in fields such as 
sociology, psychology, education, and market research, are not that common in economics. Typically, in most 
economic research a regression model contains some explanatory variables that are quantitative and some that 
are qualitative. Regression models containing an admixture of quantitative and qualitative variables are called 
analysis of covariance (ANCOVA) models. ANCOVA models are an extension of the ANOVA models in 
that they provide a method of statistically controlling the effects of quantitative regressors, called covariates 
or control variables, in a model that includes both quantitative and qualitative, or dummy, regressors. We 
now illustrate the ANCOVA models. 


Example 9.3: Average Per Capita Consumption Expenditure in Relation to Region and Household 
Size 


To motivate the analysis, let us reconsider Example 9.1 by maintaining that the average per capita consumption 
expenditure may not be different in the three regions if we take into account any variables that cannot be 
standardized across the regions. Consider, for example, the variable household size, as the number of persons 
in a family influence the per capita consumption expenditure. To see if this is the case, we develop the 
following model: 
Y; = By + B2D2; + B3;D3; + B4Xu; (9.4.1) 
where Y;= Average consumption expenditure (Rs.) per person per 30 days in state i 
X; = Average household size (the number of persons) in state i 
D; = 1 if the state is in the Eastern region of India l 
= 0 otherwise 
D3;= 1 if the state is in the north-west-central region of the country 
= 0 otherwise 
The data on X are given in Table 9.1. Keep in mind that we are treating the South as the benchmark category. 
Also, note that besides the two qualitative regressors, we have a quantitative variable, X, which in the context 
of the ANCOVA models is known as the covariate, as noted earlier. 
From the data in Table 9.1, the results of the model (9.4.1) are as follows: 


Ê = 2454.72 + 16.06 Dz; + 314.02 D3, - 344.72 X, z (9.4.2) 


se= (505.33) (145.22) (165.64) (126.49) 
t (4.86)*  (0.11)** (1.90)** (-2.73)* 
R? = 0.5192 
where * indicates p values less than 5 percent, and ** indicates p values greater than 5 percent. 

As these results suggest, ceteris paribus: as household size goes up by one person, on average, the per 
capita consumption expenditure goes down by about Rs. 344.72. Controlling for household size, we know 
see that the differential intercept coefficient is not significant for either the East or North-West-Central region. 
These results are different from those of Eq. (9.2.5). But this should not be surprising, for in Eq. (9.2.5) we did 
not account for the covariate, differences in household size. Diagrammatically, we have the situation shown 
in Fig. 9.2. 7 

Note that although we have shown three regression lines for the three regions, statistically the regression 
lines are the same for all three regions. Also note that the three regression lines are drawn parallel (why?). 
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Figure 9.2 Per Capital Consumption Expenditure (Y) in relation to Household Size (X) 


9.5 The Dummy Variable Alternative to the Chow Test? 


In Section 8.7 we discussed the Chow test to examine the structural stability of a regression model. The 
example we discussed there related to the relationship between savings and income in India over the period 
1974-75 to 1995-96. We divided the sample period into two, 1974-75 to 1988-89 and 1989-90 to 1995-96, 
and showed on the basis of the Chow test that there was a difference in the regression of savings on income 
between the two periods. 

However, we could not tell whether the difference in the two regressions was because of differences in the 
intercept terms or the slope coefficients or both. Very often this knowledge itself is very useful. 

Referring to Eqs. (8.7.1) and (8.7.2), we see that there are four possibilities, which we illustrate in 
Figure 9.3. 


1. Both the intercept and the slope coefficients are the same in the two regressions. This, the case of 
coincident regressions, is shown in Figure 9.3a. 

2. Only the intercepts in the two regressions are different but the slopes are the same. This is the case of 
parallel regressions, which is shown in Figure 9.35. 

3. The intercepts in the two regressions are the same, but the slopes are different. This is the situation of 
concurrent regressions (Figure 9.3c). 

4. Both the intercepts and slopes in the two regressions are different. This is the case of dissimilar regres- 
sions, which is shown in Figure 9.3d. 


The multistep Chow test procedure discussed in Section 8.7, as noted earlier, tells us only if two (or more) 
regressions are different without telling us what the source of the difference is. 

The source of difference, if any, can be pinned down by pooling all the observations (22 in all) and running 
just one multiple regression as shown below: 


Y, = a) + 2D, + BX; + Bo(DiX1) + ut (9.5.1) 


8The material in this section draws on the author’s articles, “Use of Dummy Variables in Testing for Equality between Sets 
of Coefficients in Two Linear Regressions: A Note,” and “Use of Dummy Variables . . . A Generalization,” both published in 
the American Statistician, vol. 24, nos. 1 and 5, 1970, pp. 50-52 and 18-21. 

2 


As in the Chow test, the pooling technique assumes homoscedasticity, that is, of = of = 9°.. 
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Income Income 


(a) Coincident regressions (b) Parallel regressions 


Savings 


Income Income 


(c) Concurrent regressions (d) Dissimilar regressions 


Figure 9.3 Plausible savings—income regressions. 


where Y= savings 


X = income 
t = time 
D = 1 for observations in 1989—90 to 1995-96 > 


= 0, otherwise (i.e., for observations in 1974-75 to 1995-96) 
Table 9.2 shows the structure of the data matrix. 
To see the implications of Eq. (9.5.1), and, assuming, as usual, that E(u;) = 0, we obtain: 
Mean savings function for 1974-75 to 1988-89: 


E(Y, | Dy = 0, X+) = @ + BX, (9.5.2) 
Mean savings function for 1989-90 to 1995—96: 
E(Y,; | Dr = 1, Xr) = (a1 + @2) + (Bi + Bo) X; (9.5.3) 


The reader will notice that these are the same functions as Egs. (8.7.1) and (8.7.2), with A, = a). Ar = B; 


Yı = (œ; + œ), and y, = (B, + B2). Therefore, estimating Eq. (9.5.1) is equivalent to estimating the two 
individual savings functions in Egs. (8.7.1) and (8.7.2). 
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Table 9.3 Savings and Income Data (Rs. Crore), India, 1974-75 to 1995-96 


Year Savings Income ~ Dum Year Savings Income Dum 
1974-75 12,298 64,968 0 1985-86 53,389 229,527 0 
1975-76 14,196 69,233 0 1986-87 ~ 58,036 256,413 0 
1976-77 17,320 73,824 0 1987-88 72,264 291,585 0 
1977-78 19995 85,267 0 1988-89 87,166 345,011 0 
1978-79 23,601 91,507 0 1989-90 106,092 395,239 1 
1979-80 24,213 99,632 0 1990-91 130,010 465,097 1 
1980-81 26,881 123,067 0 1991-92 141,089 531,515 1 
1981-82 30,896 142,181 0 1992-93 159,682 618,587 1 
1982-83 330787. 157,291 0 1993-94 189,933 716,964 1 
1983-84 38,091 185,749 0 1994-95 247,462 842,261 1 
1984-85 45,453 207,491 0 1995-96 291,002 959733 1 


Source: Handbook of Statistics on Indian Economy (2009-10), Reserve Bank of India, Mumbai 
Note: Dum = 1 for observations beginning in 1989—90; 0 otherwise 
Income = Personal disposable income measured in Rupee crore in 1999-2000 prices 
Savings = Gross domestic savings measured in Rupee crore, 1999-2000 prices 


In Eq. (9.5.1), a, is the differential intercept, as previously, and B, is the differential slope coefficient 
(also called the slope drifter), indicating by how much the slope coefficient of the second period’s savings 
function (the category that receives the dummy value of 1) differs from that of the first period. Notice how the 
introduction of the dummy variable D in the interactive, or multiplicative, form (D multiplied by X) enables 
us to differentiate between slope coefficients of the two periods, just as the introduction of the dummy 
variable in the additive form enabled us to distinguish between the intercepts of the two periods. 


Example 9.4: Structural Differences in the Indian Savings-Income Regression, the Dummy Variable 
l Approach 


Before we proceed further, let us first present the regression results of model (9.5.1) applied to the Indian 
Savings-Income data. 


a 


Y, = -3004.959 — 25595.554 D, + 0.249 X, + 0.075 (D,X,) 


se= (3303.149) (8678.305) (0.018) (0.022) (9.5.4) 
t= (-0.910)**  (-2.945)* (13.752)* (3.454)* 
R? = 0.9951 


where * indicates p value less than 5 percent and ** indicates p values greater than 5 percent. 

As these regression results show, both the differential intercept and slope coefficients are statistically 
significant, strongly suggesting that the savings-income regression for the two time periods are different, as 
in Figure 9.3d. 

From Eq. (9.5.4), we can derive Equations (9.5.2) and (9.5.3), which are: 

Mean savings function for 1974-75 to 1988-89: 


a 


Y, = -3004.959 + 0.249 X, (9.5.5) 
Mean savings function for 1989-90 to 1995-96: 


Y, = (3004.959 — 25595.554) + (0.249 + 0.075)X, 
= —28600.513 + 0.324 X; (9.5.6) 
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9.6 Interaction Effects Using Dummy Variables 


Dummy variables are a flexible tool that can handle a variety of interesting problems. To see this, consider 
the following model: 


Y, = a; + æD; + PiX, + Bo(D:Xt) + ur (9.6.1) 


where Y= literacy rate (percent) 
X = Per capita Net State Domestic Product (Rs.) 
D, = Gender; 1 = Female, © = otherwise 
D, = area of residence; 1 = Urban, 0 = otherwise 


In this model, gender and area of residence are qualitative regressors and per capita net state domestic product 
(PNSDP) is a quantitative regressor.'° Implicit in this model is the assumption that the differential effect of 
the fender dummy D, is constant across the two categories of area of residence and the differential effect of 
area of residence dummy D, is also constant across the two sexes. That is to say, if the mean literacy rate is 
higher for males than for females, this is do whether they are in urban area or rural area. Likewise, if, say, 
persons living in the rural area have lower mean literacy rate, this is so whether they are females or males. 

In many applications such an assumption may be untenable. Females living in rural area may have lower 
mean literacy rate than males living in rural area. In other words, there may be interaction between the 
two qualitative variables D, and D;. Therefore their effect on mean Y may not be simply additive as in 
Eq. (9.6.1) but multiplicative as well, as in the following model: 


where the variables are as defined for model (9.6.1). 
From Eq (9.6.2), we obtain: 
E(Y,i 4 |D,2i = 1, D,3i = 1, X,i) = (ay +a;2 +03 +0,4) + BX (9.6.3) 

which is the mean literacy rate function for female in urban area. 
Observe that 

a, = differential effect of being a female 

a, = differential effect of residing in urban area 

a, = differential effect of being a female in urban area 


which shows that the mean literacy rate of female in urban area is different (by a,) from the mean literacy rate 
of females or persons residing in urban area. If, for instance, all the three differential dummy coefficients are 
negative, this would imply that females in urban area have lower mean literacy rate than females or people 
residing in urban area as compared to the base category, which in the present example is male in rural area. 

Now the reader can see how the interaction dummy (i.e., the product of two qualitative or dummy 
variables) modifies the effect of the two attributes considered individually (i.e., additively). 


Example 9.5 Literacy rate in Relation to Gender, Area of Residence and Per capita Net State 
Domestic Product 


Let us first present the regression results based on model (9.6.1). Using the data that were used to estimate 
regression (9.3.1), we obtained the following results: 


101f we were to define education as less than high school, high school, and more than high school, we could then use two 
dummies to represent the three classes. 
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a 


Y, = 66.32 -16.32 D,;+ 16.00 D3; + 0.00033 X; 
t=(20.91)* (-8.37)* (8.20)* (3.54)* (9.6.4) 
R? =0.6754 n=76 
where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent. 
The reader can check that the differential intercept coefficients are statistically significant, and that PNSDP 


has a weak but positive effect on literacy rate, as economic development in state is related to other social 
development as well. 


As Eq. (9.6.4) shows, ceteris paribus, the average literacy rate of females are lower by about 16.32 percent, 
and the average literacy rate for urban area is higher by about 16 percent. 
We now consider the results of model (9.6.2), which includes the interaction dummy 


aA 


Y, = 68.56 — 20.79 Dz; + 11.53 D3; + 8.95 (D2,D3;) + 0.00033 X; 
t= (200 (€7.77)* 4.3)" (2.37)* (3.65)* . (9.6.5) 
where * indicates p values less than 5 percent. 

As you can see the two additive dummies and also the interactive dummy are all statistically significant at 
conventional 5 percent level. Interpreting the results we find that, holding the per capita net state domestic 
product constant, if you add the three dummy coefficients you will obtain: -0.32 (= -20.79 + 11.53 + 8.95), 
which means that mean literacy rate of urban females is lower by 0.32 percent, which is between the value of 
-—20.79 (gender difference alone) and 11.53 (place of residence alone). 


The preceding example clearly reveals the role of interaction dummies when two or more qualitative 
regressors are included in the model. It is important to note that in the model (9.6.5) we are assuming that the 
rate of increase of literacy rate with respect to per capita net state domestic product remains constant across 
gender and race. But this may not be the case. If you want to test for this, you will have to introduce differ- 
ential slope coefficients (see Exercise 9.25). 


9.7 The Use of Dummy Variables in Seasonal Analysis 


Many economic time series based on monthly or quarterly data exhibit seasonal patterns (regular oscillatory 
movements). Examples are sales of department stores at Christmas and other major holiday times, demand 
for money (or cash balances) by households at holiday times, demand for ice cream and soft drinks during 
summer, prices of crops right after harvesting season, demand for air travel. etc. Often it is desirable to 
remove the seasonal factor, or component, from a time series so that one can concentrate on the other compo- 
nents, such as the trend.'! The process of removing the seasonal component from a time series is known as 
deseasonalization or seasonal adjustment, and the time series thus obtained is called the deseasonalized, 
or seasonally adjusted, time series. 

There are several methods of deseasonalizing a time series, but we will consider only one of these methods, 
namely, the method of dummy variables.'* To illustrate how the dummy variables can be used to deseason- 
alize economic time series, consider the data given in Table 9.3. This table gives quarterly data for the years 
1978-1995 on the sale of four major appliances, dishwashers, garbage disposers, refrigerators, and washing 
machines, all data in thousands of units. The table also gives data on durable goods expenditure in 1982 
billions of dollars. 


A time series may contain four components: (1) seasonal, (2) cyclical, (3) trend, and (4) strictly random. 
12For the various methods of seasonal adjustment, see, for instance, Francis X. Diebold, Elements of Forecasting, 2d ed., 
South-Western Publishing, 2001, Chapter 5. 
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Table 9.4 Quarterly Data on Appliance Sales (in thousands) and Expenditure on Durable Goods (1978—I to 1985—IV) 


DISH DISP FRIG WASH DUR DISH DISP FRIG WASH DUR 
841 798 1317 1271 252.6 480 706 943 1036 247.7 


957 837 1615 1295 272.4 530 582 1175 1019 249.1 
999 821 1662 1313 270.9 557 659 1269 1047 251.8 
960 858 1295 1150 273.9 602 837 973 918 262 

894 837 1271 1289 268.9 658 867 1102 1137 263.3 
851 838 1555 1245 262.9 749 860 1344 1167 280 

863 832 1639 1270 270.9 827 918 1641 1230 288.5 
878 818 1238 1103 263.4 858 1017 1225 1081 300.5 
792 868 1277 1273 260.6 808 1063 1429 1326 312.6 
589: (628) 1258 =1031mm231.9 840 955" 1699 i228 93225 
657 662 1417 1143 242.7 893 973 1749 1297 324.3 
699 822 1185 1101 248.6 950 1096 117 gee 3337 
675 871 1196 1181 258.7 838 1086 1242 1292 344.8 
652 791 1410 1116 248.4 884 990 1684 1342 350.3 
628 759 1417 1190 255.5 905 1028 1764 1323 369.1 


529 734 919 1125 240.4 909 1003 1328 1274 356.4 


Note: DISH = dishwashers; DISP = garbage disposers; FRIG = refrigerators; WASH = washing machines; DUR = durable 
goods expenditure, billions of 1982 dollars. 


Source: Business Statistics and Survey of Current Business, Department of Commerce (various issues). 


To illustrate the dummy technique, we will consider 1800 
only the sales of refrigerators over the sample period. 
But first let us look at the data, which is shown in „ 1600 
Figure 9.4. This figure suggests that perhaps there E 
is a seasonal pattern in the data associated with the $ 1400 
various quarters. To see if this is the case, consider the 4 
following model: i 1200 
Y, = 01 Du +a,Dy +03 Dy +04Du tu, O71) È 
where Y, = sales of refrigerators (in thousands) and _ 
the D’s are the dummies, taking a value of 1 in the aa 
relevant quarter and 0 otherwise. Note that to avoid 78 79 30 BB SS e E 
the dummy variable trap, we are assigning a dummy Year 


to each quarter of the year, but omitting the intercept Figure 9.4 Sales of refrigerators 1978-1985 (quarterly) 
term. If there is any seasonal effect in a given quarter, that will be indicated by a statistically significant ¢ value 
of the dummy coefficient for that quarter. '* 

Notice that in Eq. (9.7.1) we are regressing Y effectively on an intercept, except that we allow for a 
different intercept in each season (i.e., quarter). As a result, the dummy coefficient of each quarter will give 
us the mean refrigerator sales in each quarter or season (why?). l 


Example 9.6 Seasonality in Refrigerator Sales 


From the data on refrigerator sales given in Table 9.4, we obtain the following regression results: 
Ye = 1,222.125D1: + 1,467.500D2: + 1,569.750D3, + 1,160.000D4: 
t= (20.3720) (24.4622) (26.1666) (19.3364) (9.7.2) 
R? = 0.5317 


"Note a technical point. This method of assigning a dummy to each quarter assumes that the seasonal factor, if present, is 
deterministic and not stochastic. We will revisit this topic when we discuss time series econometrics in Part V of this book. 
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Note: We have not given the standard errors of the estimated coefficients, as each standard error is equal to 
59.9904, because all the dummies take only a value of 1 or zero. 

The estimated « coefficients in Eq. (9.7.2) represent the average, or mean, sales of refrigerators (in thousands 
of units) in each season (i.e., quarter). Thus, the average sale of refrigerators in the first quarter, in thousands 
of units, is about 1,222, that in the second quarter about 1,468, that in the third quarter about 1,570, and 
that in the fourth quarter about 1,160. 


Table 9.5 U.S. Refrigerator Sales (thousands), 1978-1985 (quarterly) 


FRIG DUR D2 D3 D4 FRIG DUR D2 D; D4 
1317 252.6 0 0 0 943 247.7 0 0 0 
1615 272.4 1 0 0 ZS 249.1 1 0 0 
1662 270.9 0 1 0 1269 251.8 0 1 0 
1295 273:9 0 0 i 973 262.0 0 0 1 
1271 268.9 0 0 0 1102 263.3 0 0 0 
1355 262.9 1 0 0 1344 280.0 1 0 0 
1639 270.9 0 1 0 1641 288.5 0 1 0 
1238 263.4 0 0 1 1225 300.5 0 0 1 
Zee 260.6 0 0 0 1429 312.6 0 0 0 
1258 231.9 1 0 0 1699 322.5 1 0 0 
1417 242.7 0 1 0 1749 324.3 0 1 0 
1185 248.6 0 0 1 eg 333.1 0 0 1 
1196 258.7 0 0 0 1242 344.8 0 0 0 
1410 248.4 1 0 0 1684 350.3 1 0 0 
1417 255.5 0 1 0 1764 369.1 0 1 0 
a9 240.4 0 0 1 1328 356.4 0 0 1 


Note: FRIG = refrigerator sales, thousands. 
DUR = durable goods expenditure, billions of 1982 dollars. 
Dz = 1 in the second quarter, 0 otherwise. 
D, = 1 in the third quarter, 0 otherwise. 
D; = ! in the fourth quarter, 0 otherwise. 
Source: Business Statistics and Survey of Current Business, Department of Commerce (various issues). 


Incidentally, instead of assigning a dummy for each quarter and suppressing the intercept term to avoid the 
dummy variable trap, we could assign only three dummies and include the intercept term. Suppose we treat 
the first quarter as the reference quarter and assign dummies to the second, third, and fourth quarters. This 
produces the following regression results (see Table 9.4 for the data setup): 


¥; = 1,222.1250 + 245.3750D2; + 347.6250D3; — 62.1250D4: 


t= (20.3720)* (2.8922)* (4.0974)*  (—0.7322)** (9.7.3) 
R? = 0.5318 


where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent. 

Since we are treating the first quarter as the benchmark, the coefficients attached to the various dummies 
are now differential intercepts, showing by how much the average value of Y in the quarter that receives a 
dummy value of 1 differs from that of the benchmark quarter. Put differently, the coefficients on the seasonal 
dummies will give the seasonal increase or decrease in the average value of Y relative to the base season. If 
you add the various differential intercept values to the benchmark average value of 1,222.125, you will get 
the average value for the various quarters. Doing so, you will reproduce exactly Eq. (9.7.2), except for the 
rounding errors. 

But now you will see the value of treating one quarter as the benchmark quarter, for Eq. (9.7.3) shows that 
the average value of Y for the fourth quarter is not statistically different from the average value for the first 
quarter, as the dummy coefficient for the fourth quarter is not statistically significant. Of course, your answer 
will change, depending on which quarter you treat as the benchmark quarter, but the overall conclusion will 
not change. 
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How do we obtain the deseasonalized time series of refrigerator sales? This can be done easily. You estimate 
the values of Y from model (9.7.2) (or [9.7.3]) for each observation and subtract them from the actual values 
of Y, that is, you obtain (Y; —¥;) which are simply the residuals from the regression (9.7.2). We show them in 
Table 9.6.'* To these residuals, we have to add the mean of the Y series to get the forecasted values. 


Table 9.6 Refrigerator Sales Regression: Actual, Fitted, and Residual Values (Eq. 9.7.3) 


Residuals Graph 
Actual Fitted Residuals 0 

1978-1 1317 122232 94.875 X * 
1978-Il 1615 1467.50 147.500 - x 
1978-Ill 1662 1569.75 92.250 3 Bee r 
1978-IV 1295 1160.00 135.000 : i 
1979- 1271 1222.12 48.875 ; ae 
1979-II 1555 1467.50 87.500 . Sa 
1979-IIl 1639 1569.75 69.250 i gi 
1979-IV 1238 1160.00 78.000 : z 
1980- 1277 1222312 54.875 ; a 
1980-li 1258 1467.50 —209.500 z : 
1980-lil 1417 1569.75 —152.750 t ; 
1980-IV 1185 1160.00 25.000 me SK es 
1981-1 1196 [222272 —26.125 5 ; 
1981-II 1410 1467.50 —57.500 5 i 
1981-Il 1417 1569.75 —152.750 ae F À 
1981-IV 919 1160.00 —241.000 *, 7 
1982- 943 1222.12 —279.125 *. 5 
1982-II 1175 1467.50 —292.500 pas r 
1982- 1269 1569.75 —300.750 aor: ; 
1982-IV 973 1160.00 —187.000 *, 
1983-1 1102 1222.12 —120.125 aa p 
1983- 1344 1467.50 —123.500 * š 
1983-Hil 1641 1569.75 ` 71.250 5 Be 
1983-IV 1225 1160.00 65.000 : oar ž 
1984-1 1429 1222 12 206.875 P 50 
1984-II 1699 1467.50 231.500 : _* 
1984-Ill 1749 1569.75 179.250 ; k 
1984-iV 1117 1160.00 —43.000 BK ne 
1985-1 1242 222 19.875 amie ee 
1985-II 1684 1467.50 216.500 ; ai 
1985-Ill 1764 1569.75 194.250 s .* 
1985-IV 1328 1160.00 168.000 i * 

- 0+ 


140f course, this assumes that the dummy variables technique is an appropriate method of deseasonalizing a time series and 
that a time series (TS) can be represented as: TS = s + c + t + u, where s represents the seasonal, t the trend, c the cyclical, 
and u the random component. However, if the time series is of the form, TS = (s)(0)(t)(u), where the four components enter 
multiplicatively, the preceding method of deseasonalization is inappropriate, for that method assumes that the four compo- 
nents of a time series are additive. But we will have more to say about this topic in the chapters on time series econometrics. 
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What do these residuals represent? They represent the remaining components of the refrigerator time 
series, namely, the trend, cycle, and random components (but see the caution given in footnote 15). 

Since models (9.7.2) and (9.7.3) do not contain any covariates, will the picture change if we bring in a 
quantitative regressor in the model? Since expenditure on durable goods has an important factor influence on 
the demand for refrigerators, let us expand our model (9.7.3) by bringing in this variable. The data for durable 
goods expenditure in billions of 1982 dollars are already given in Table 9.3. This is our (quantitative) X variable 
in the model. The regression results are as follows 


Ýi = 456.2440 + 242.4976D> + 325.2643D3; — 86.0804D4; + 2.7734X; 


t= (2.5593)* (3.6951) (4.9421)*  (—1.3073)** (4.4496)* (9.7.4) 
R? = 0.7298 


where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent. 

Again, keep in mind that we are treating the first quarter as our base. As in Eq. (9.7.3), we see that the 
differential intercept coefficients for the second and third quarters are statistically different from that of the 
first quarter, but the intercepts of the fourth quarter and the first quarter are statistically about the same. The 
coefficient of X (durable goods expenditure) of about 2.77 tells us that, allowing for seasonal effects, if expen- 
diture on durable goods goes up by a dollar, on average, sales of refrigerators go up by about 2.77 units, that 
is, approximately 3 units; bear in mind that refrigerators are in thousands of units and X is in (1982) billions 
of dollars. 

An interesting question here is: Just as sales of refrigerators exhibit seasonal patterns, would not expen- 
diture on durable goods also exhibit seasonal patterns? How then do we take into account seasonality in X? 
The interesting thing about Eq. (9.7.4) is that the dummy variables in that model not only remove the season- 
ality in Y but also the seasonality, if any, in X. (This follows from a well-known theorem in statistics, known as 
the Frisch-Waugh theorem. '°) So to speak, we kill (deseasonalize) two birds (two series) with one stone 
(the dummy technique). 

If you want an informal proof of the preceding statement, just follow these steps: (1) Run the regression 
of Yon the dummies as in Eq. (9.7.2) or Eq. (9.7.3) and save the residuals, say, $4; these residuals represent 
deseasonalized Y. (2) Run a similar regression for X and obtain the residuals from this regression, say, $>; 
these residuals represent deseasonalized X. (3) Regress $} on S,. You will find that the slope coefficient in this 
regression is precisely the coefficient of X in the regression (9.7.4). 


9.8 Piecewise Linear Regression 


To illustrate yet another use of dummy variables, consider Figure 9.5, which shows how a hypothetical 
company remunerates its sales representatives. It pays commissions based on sales in such a manner that up 
to a certain level, the target, or threshold, level X“, there is one (stochastic) commission structure and beyond 
that level another. (Note: Besides sales, other factors affect sales commission. Assume that these other factors 
are represented by the stochastic disturbance term.) More specifically, it is assumed that sales commission 
increases linearly with sales until the threshold level X’, after which it continues to increase linearly with 
sales but at a much steeper rate. Thus, we have a piecewise linear regression consisting of two linear pieces 
or segments, which are labeled I and II in Figure 9.5, and the commission function changes its slope at the 
threshold value. Given the data on commission, sales, and the value of the threshold level X`, the technique 
of dummy variables can be used to estimate the (differing) slopes of the two segments of the piecewise linear 
regression shown in Figure 9.5. We proceed as follows: 


Y¥; =a + BX; + BX; — X*)D; + u; (9.8.1) 


15For proof, see Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar, Lyme, U.K., 1995, pp.150-152. 
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Sales commission 


X (sales) 


* 


X 


Figure 9.5 Hypothetical relationship between sales commission and sales volume. (Note: The intercept on the Y axis 
denotes minimum guaranteed commission.) 


where Y, = sales commission 
X; = volume of sales generated by the sales person 


X* = threshold value of sales also known as a knot (known in advance j“ 


D=1ifX>X 
=0if X <X 
Assuming E(u;) = 0, we see at once that 


E(Y; | Di = 0, Xi, X*) =a + BX; (9.8.2) 
which gives the mean sales commission up to the target level X* and 


E(Y; | Dj = 1, Xi, X*) = a — BoX” + (Bi + Bo) X; 
which gives the mean sales commission beyond the target level x. 

Thus, 8, gives the slope of the regression line in segment I, and 6, + B, gives the slope of the regression 
line in segment II of the piecewise linear regression shown in Figure 9.5. A test of the hypothesis that there is 
no break in the regression at the threshold value X* can be conducted easily by noting the statistical signifi- 
cance of the estimated differential slope coefficient B> (see Figure 9.6). 


Incidentally, the piecewise linear regression we have just discussed is an example of a more general class 
of functions known as spline functions.” l 


y ` (9.8.3) 


16The threshold value may not always be apparent, however. An ad hoc approach is to plot the dependent variable against 
the explanatory variable(s) and observe if there seems to be a sharp change in the relation after a given value of X (i.e., X^). 
An analytical approach to finding the break point can be found in the so-called switching regression models. But this is 
an advanced topic and a textbook discussion may be found in Thomas Fomby, R. Carter Hill, and Stanley Johnson, Advanced 
Econometric Methods, Springer-Verlag, New York, 1984, Chapter 14. 


‘For an accessible discussion on splines (i.e., piecewise polynomials of order k), see Douglas C. Montgomery, Elizabeth A. 


Peck, and G. Geoffrey Vining, Introduction to Linear Regression Analysis, John Wiley & Sons, 3d ed., New York, 2001, pp. 
228-230. 
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Sales commission 


ahs 


X (sales) 


Figure 9.6 Parameters of the piecewise linear regression. 


Example 9.7 Total Cost in Relation to Output 


As an example of the application of the piecewise linear regression, consider the hypothetical total cost-total 
output data given in Table 9.6. We are told that the total cost may change its slope at the output level of 
§,500 units. : 

Letting Yin Eq. (9.8.4) represent total cost and X total output, we obtain the following results: 


¥j=-145.72 + 0.2791X;+ 0.0945(X; — X7)D; 
t= (—0.8245) (6.0669) (1.1447) (9.8.4) 
R? =0.9737  X" = 5,500 
As these results show, the marginal cost of production is about 28 paisa per unit and although it is about 
37 paisa (28 + 9) for output over 5,500 units, the difference between the two is not statistically significant 


because the dummy variable is not significant at, say, the 5 percent level. For all practical purposes, then, one 
can regress total cost on total output, dropping the dummy variable. 


Table 9.7 Hypothetical Data on Output and Total Cost 


Total Cost, Rupees Output, Units 
256 1,000 
414 2,000 
634 3,000 
778 4,000 
1,003 5,000 
1,839 6,000 
2,081 7,000 
2,423 8,000 
2,734 i 9,000 


2,914 10,000 
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9.9 Panel Data Regression Models 


Recall that in Chapter 1 we discussed a variety of data that are available for empirical analysis, such as 
cross-section, time series, pooled (combination of time series and cross-section data), and panel data. The 
technique of dummy variable can be easily extended to pooled and panel data. Since the use of panel data is 
becoming increasingly common in applied work, we will consider this topic in some detail in Chapter 16. 


9.10 Some Technical Aspects of the Dummy Variable Technique 


The Interpretation of Dummy Variables in Semilogarithmic Regressions 


In Chapter 6 we discussed the log—lin models, where the regressand is logarithmic and the regressors are 
linear. In such a model, the slope coefficients of the regressors give the semi elasticity, that is, the percentage 
change in the regressand for a unit change in the regressor. This is only so if the regressor is quantitative. 
What happens if a regressor is a dummy variable? To be specific, consider the following model: 


In Y; = 8, + BoD; + ui (9.10.1) 
where Y = literacy rate (%) and D = 1 for female and 0 for male. 


How do we interpret such a model? Assuming E(u;) = 0, we obtain: 
Literacy rate for males: 


ie ys | — 1) Bs (9.10.2) 
Literacy rate for females: 
E(ln Y; | D; = 1) = Bi + 2 (9.10.3) 


Therefore, the intercept B, gives the mean log literacy rate and the “slope”-coefficient gives the difference 
in the mean log literacy rate of male and females. This is a rather awkward way of stating things. But if we 
take the antilog of B,, what we obtain is not the mean hourly literacy rate for males, but their median literacy 
rate. As you know, mean, median, and mode are the three measures of central tendency of a random variable. 
And if we take the antilog of (B; + £2), we obtain the median literacy rate for females. 


Example 9.8 Logarithm of Literacy Rate in Relation to Gender 
To illustrate Eq. (9.10.1), we use the data that underline Example 9.2. The regression results based on 76 


observations are as follows: K 
InY, = 4.42 — 0.24 D; 
t = (147.85)* (-5.60)* a (9.10.4) 
R? = 0.2978 


where * indicates p values are practically zero. 

Taking the antilog of 4.42, we find 83.35 (%), which is the median literacy rate for males, and taking 
the antilog of [(4.42 - 0.24) = 4.19], we obtain 65.77 (%), which is the median literacy rate for females. 
Thus, the female literacy rate are lower by about 21.09 percent compared to their male counterparts 
{[(83.35 - 65.77)/83.35] x 100}. 

Interestingly, we can obtain semielasticities for dummy regressors directly by the device suggested by 
Halvorsen and Palmquist!®. Take the antilog (to base e) for the estimated dummy coefficient and subtract 1 from 
it and multiply the difference by 100. (For the underlying logic, see Appendix 9.4.1). Therefore, if you take the 
antilog of -0.24, you will obtain 0.79. Subtracting 1 from this gives -0.21. After multiplying this by 100, 
we get -21.09 percent, suggesting that females (D = 1) median literacy rate is lower than that of her male 
counterpart by about 21.09 percent, the same as we obtained previously, save the rounding errors. 


ee LS 


"8Robert Halvorsen and Raymond Palmquist, “The Interpretation of Dummy Variables in Semilogarithmic Equations,” 
American Economic Review, vol. 70, no. 3, pp. 474-475. 
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Dummy Variables and Heteroscedasticity 


Let us revisit our savings—income regression for India for the periods 1974-75 to 1988-89 and 1990-91 to 
1995-96 and for the entire period 1974-75 to 1995-96. In testing for structural stability using the dummy 
technique, we assumed that the error var (w,;) = var (u»,) = 07, that is, the error variances in the two periods, 
were the same. This was also the assumption underlying the Chow test. If this assumption is not valid—that 
is, the error variances in the two subperiods are different—it is quite possible to draw misleading conclu- 
sions. Therefore, one must first check on the equality of variances in the subperiod, using suitable statistical 
techniques. Although we will discuss this topic more thoroughly in the chapter on heteroscedasticity, in 
Chapter 8 we showed how the F test can be used for this purpose.!? (See our discussion of the Chow test in 
that chapter.) As we showed there. it seems the error variances in the two periods are not the same. Hence, the 
results of both the Chow test and the dummy variable technique presented before may not be entirely reliable. 
Of course, our purpose here is to illustrate the various techniques that one can use to handle a problem (e.g., 
the problem of structural stability). In any particular application, these techniques may not be valid. But that 
is par for most statistical techniques. Of course, one can take appropriate remedial actions to resolve the 
problem, as we will do in the chapter on heteroscedasticity later (however, see Exercise 9.28). 


Dummy Variables and Autocorrelation 


Besides homoscedasticity, the classical linear regression model assumes that the error term in the regression 
models is uncorrelated. But what happens if that is not the case, especially in models involving dummy 
regressors? Since we will discuss the topic of autocorrelation in depth in the chapter on autocorrelation, we 
will defer the answer to this question until then. 


What Happens if the Dependent Variable is a Dummy Variable? 


So far we have considered models in which the regressand is quantitative and the regressors are quanti- 
tative or qualitative or both. But there are occasions where the regressand can also be qualitative or dummy. 
Consider, for example, the decision of a worker to participate in the labor force. The decision to participate 
is of the yes or no type, yes if the person decides to participate and no otherwise. Thus, the labor force 
participation variable is a dummy variable. Of course, the decision to participate in the labor force depends 
on several factors, such as the starting wage rate, education, and conditions in the labor market (as measured 
by the unemployment rate). 

Can we still use ordinary least squares (OLS) to estimate regression models where the regressand is 
dummy? Yes, mechanically, we can do so. But there are several statistical problems that one faces in such 
models. And since there are alternatives to OLS estimation that do not face these problems, we will discuss 
this topic in a later chapter (see Chapter 15 on logit and probit models). In that chapter we will also discuss 
models in which the regressand has more than two categories; for example, the decision to travel to work 
by car, bus, or train, or the decision to work part-time, full time, or not work at all. Such models are called 
polytomous dependent variable models in contrast to dichotomous dependent variable models in which 
the dependent variable has only two categories. 


The Chow test procedure can be performed even in the presence of heteroscedasticity, but then one will have to use the 
Wald test. The mathematics involved behind the test are somewhat involved. But in the chapter on heteroscedasticity, 
we will revisit this topic. 
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9.11 Topics for Further Study 


Several topics related to dummy variables are discussed in the literature that are rather advanced, including 
(1) random, or varying, parameters models, (2) switching regression models, and (3) disequilibrium 
models. 

In the regression models considered in this text it is assumed that the parameters, the 6’s, are unknown but 
fixed entities. The random coefficient models—and there are several versions of them—assume the B's can 
be random too. A major reference work in this area is by Swamy.” 

In the dummy variable model using both differential intercepts and slopes, it is implicitly assumed that 
we know the point of break. Thus, in our savings—income example for 1974-75 to 1995-96, we divided 
the period into 1974-75 to 1988-89 and 1990-91 to 1995-96, pre- and post-liberalization periods, under 
the belief that the liberalization changed the relation between savings and income. Sometimes it is not easy 
to pinpoint when the break has taken place. The technique of switching regression models (SRM) has 
been developed for such situations. SRM treats the breakpoint as a random variable and through an iterative 
process determines when the break might have actually taken place. The seminal work in this area is by 
Goldfeld and Quandt.” 

Special estimation techniques are required to deal with what are known as disequilibrium situations, 
that is, situations where markets do not clear (i.e., demand is not equal to supply). The classic example is 
that of demand for and supply of a commodity. The demand for a commodity is a function of its price and 
other variables, and the supply of the commodity is a function of its price and other variables, some of which 
are different from those entering the demand function. Now the quantity actually bought and sold of the 
commodity may not necessarily be equal to the one obtained by equating the demand to supply, thus leading 
to disequilibrium. For a thorough discussion of disequilibrium models, the reader may refer to Quandt.” 


9.12 A Concluding Example 


We end this chapter with an example that illustrates some of the points made in this chapter. Table 9.8 
provides data on a sample of 261 workers in an industrial town in southern India in 1990. 
The variables are defined as follows: 
WI = weekly wage income in rupees 
Age = age in years 
Dex = 1 for male workers and 0 for female workers 
DE, = a dummy variable taking a value of 1 for workers with an education level up to primary 
DE, = a dummy variable taking a value of 1 for workers up to a secondary level of education 
DE, = a dummy variable taking a value of 1 for workers with higher than secondary education 
DPT =a dummy variable taking a value of 1 for workers with permanent jobs and a value of O for 
temporary workers : 
The reference category is male workers with no primary education and temporary jobs. Our interest is 
in finding out how weekly wages relate to age, sex, level of education, and job tenure. For this purpose, we 
estimate the following regression model: 


In WI; = $1 + 2 AGE; + B3Dsex + 4 DE: + Bs DE; + Bo DE, + By DPT + u; 


?0P, A.V. B. Swamy, Statistical Inference in Random Coefficient Regression Models, Springer-Verlag, Berlin, 1971. 
718. Goldfeld and R. Quandt, Nonlinear Methods in Econometrics, North Holland, Amsterdam, 1972. 
22Richard E. Quandt, The Econometrics of Disequilibrium, Basil Blackwell, New York, 1988. 
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Table 9.8 Indian Wage Earners, 1990 


WI AGE DE, DE, DE, DPT  Dsex Wi AGE DE, DE; DE, DPT  Dsex 
120 57 0 0 0 0 0 120 21 0 0 0 0 0 
224 48 0 0 1 -1 0 25 18 0 0 0 0 i 
132 38 0 0 0 0 0 25 11 0 0 0 0 1 
75 27 0 1 0 0 0 30 38 0 0 0 1 1 
111 23 0 1 0 0 1 30 Wi 0 0 0 1 1 
127 22 0 1 0 0 0 122 20 0 0 0 0 0 
30 18 0 0 0 0 0 288 50 0 1 0 1 0 
24 12 0 0 0 0 0 75 45 0 0 0 0 1 
119 38 0 0 0 1 0 79 60 0 0 0 0 0 
75 55 0 0 0 0 0 85.3 26 1 0 0 0 1 
324 26 0 1 0 0 0 350 42 0 1 0 1 0 
42 18 0 0 0 0 0 54 62 0 0 0 1 0 
100 32 0 0 0 0 0 110 23 0 0 0 0 0 
136 41 0 0 0 0 0 342 56 0 0 0 1 0 
107 48 0 0 0 0 0 77 Sale 0 0 0 1 0 
50 16 1 0 0 0 1 370 46 0 0 0 0 0 
90 45 0 0 0 0 0 156 26 0 0 0 1 0 
377 46 0 0 0 1 0 261 23 0 0 0 0 0 
150 30 0 1 0 0 0 54 16 0 1 0 0 0 
162 40 0 0 0 0 0 130 33 0 0 0 0 0 
18 19 1 0 0 0 0 112 27 1 0 0 0 0 
128 25 1 0 0 0 0 82 22 1 0 0 0 0 
47.5 46 0 0 0 0 1 385 30 0 1 0 1 0 
135 25 0 1 0 0 0 94.3 22 0 0 1 l 1 
400 57 0 0 0 1 0 350 57 0 0 0 1 0 
91.8 35 0 0 1 1 0 108 26 0 0 0 0 0 
140 44 0 0 0 ice 0 20 14 0 0 0 0 0 
BO 22 0 0 0 0 0 53.8 14 0 0 0 0 1 
30 ie) 1 0 0 0 0 427 55 0 0 0 1 0 
40.5 37 0 0 0 0 1 18 12 0 0 0 0 0 
81 20 0 0 0 0 0 120 38 0 0 0 0 0 
105 40 0 0 0 0 0 40.5 17 0 0 0 0 0 
200 30 0 0 0 0 0 375 42 1 0 0 1 0 
140 30 0 0 0 1 0 120 34 0 0 0 0 0 
80 26 0 0 0 0 0 175 33 1 0 0 1 0 
47 41 0 0 0 0 1 50 26 0 0 0 0 1 
125 22 0 0 0 0 0 100 33 1 0 0 1 0 
500 21 0 0 0 0 0 25 22 0 0 0 1 1 
100 19 0 0 0 0 0 40 15 0 0 0 1 0 
105 35 0 0 0 0 0 65 14 0 0 0 1 0 
300 35 0 1 0 1 0 47.5 25 0 0 0 1 1 
115 33 0 1 0 1 1 163 25 0 0 0 1 0 
103 27 0 0 1 1 1 175 50 0 0 0 1 1 
190 62 1 0 0 0 0 150 24 0 0 0 1 1 
62.55 18 0 1 0 0 0 163 28 0 0 0 1 0 
50 25 1 0 0 0 0 163 30 1 0 0 1 0 
273 43 0 0 1 1 1 50 25 0 0 0 1 1 
175 40 0 1 0 1 0 395 45 0 1 0 1 0 
117 26 1 To) 0 1 0 175 40 0 0 0 1 1 
950 47 0 0 1 0 0 87.5 25 1 0 0 0 0 
100 30 0 0 0 0 0 75 18 0 0 0 0 0 
140 30 0 0 0 0 0 163 24 0 0 0 1 0 
97 25 0 1 0 0 0 325 55 0 0 0 1 0 
150 36 0 0 0 0 0 121 27 0 1 0 0 0 
25 28 0 0 0 0 1 600 35 1 0 0 0 0 
15 13 0 0 0 0 1 52 19 0 0 0 0 0 
131 55 0 0 0 0 0 117 28 1 0 0 0 0 


| 
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Following the literature in Labor Economics, we are expressing the (natural) log of wages as a function of 
the explanatory variables. As noted in Chapter 6, the size distribution of variables such as wages tends to be 
skewed; logarithmic transformations of such variables reduce both skewness and heteroscedasticity. 

Using EViews6, we obtain the following regression results. 


Dependent Variable: Ln(WI) 
Method: Least Squares 
Sample: 1 261 

Included observations: 261 


Coefficient Std. Heror t-Statistic Prob. 
@ 3AE 872 0.113845 32756055 0.0000 
AGE 0.026549 ORGOS I 8.516848 0.0000 
Dszx -0.656338 0.088796 =f 53 9529 0.0000 
DE, 0.113862 0.098542 LS Sigs 0.2490 
DE 0.412589 0.096383 4.280732 0.0000 
DE, 0.554129 Oni 55224 3.569862 0.0004 
DPT 0.558348 0.079990 6.980248 0.0000 
R-squared 0.534969 Mean dependent var. 4 793390 
Adjusted R-squared 0.523984 S.D. dependent var. Omega 27a, 
S.E. of regression 0575600 Akaike info criterion 1.759648 
Sum squared resid. 84.15421 Schwarz criterion IBS 52.48 
Log likelihood -222.6340 Hannan-Quinn criter. 1.798076 
F-statistic 48.70008 Durbin-Watson stat. aly 85398316F 


Prob(F-statistic) 0.000000 


These results show that the logarithm of wages is positively related to age, education, and job permanency 
but negatively related to gender, an unsurprising finding. Although there seems to be no practical difference 
in the weekly wages of workers with primary or less than primary education, the weekly wages are higher for 
workers with secondary education and much more so for workers with higher education. 

The coefficients of the dummy variables are to be interpreted as differential values from the reference 
category. Thus, the coefficient of the DPT variable suggests that those workers who have permanent jobs on 
average make more money than those workers whose jobs are temporary. 

As we know from Chapter 6, in a log—lin model (dependent variable in the logarithm form and the explan- 
atory variables in the linear form), the slope coefficient of an explanatory variable represents semielasticity, 
that 1s, it gives the relative or percentage change in the dependent variable for a unit change in the value of 
the explanatory variable. But as noted in the text, when the explanatory variable is a dummy variable. we 
have to be very careful. Here we have to take the anti-log of the estimated dummy coefficient. subtract 1 
from it, and multiply the result by 100. Thus, to find out the percentage change in weekly wages for those 
workers who have permanent jobs versus those who have temporary jobs. we take the anti-log of the DPT 
coefficient of 0.558348, subtract 1, and then multiply the difference by 100. For our example. this turns out to 
be (rene) = (1.74778 — 1) = 0.74778, or about 75%. The reader is advised to calculate such percentage 
changes for the other dummy variables included in the model. 

Our results show that gender and education have differential effects on weekly earnings. Is it possible that 
there is an interaction between gender and the level of education? Do male workers with higher education 
earn higher weekly wages than female workers with higher education? To examine this possibility, we can 
extend the above wage regression by interacting gender with education. The regression results are as follows: 
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Dependent Variable: Ln(WT) 
Method: Least Squares 
Sample: 1 261 

Included observations: 261 


Coefficient Std. Error t-Statistic 


Prob. 
(e Serie S00 0.114536 32L A5734 0.0000 
AGE 0.027051 OR CUSL33 87 094553 0.0000 
Dex =r Oo 0.110410 -6.874148 0.0000 
DE> 0.088923 0.106827 0.832402 0.4060 
DE3 0.350574 0.104309 3860913 950009 
DE; 0.438673 0.186996 2.345898 0.0198 
Dspx* DE 0.114908 GE275039 0.417788 0.6765 
Dsgx* DE 0910S 2 G2 5921611: ERS USS si MUST 
Dsgx* DE 0.369520 Gesis 503 SGS 0.2396 
DPT OpS51658 0.080076 6.889198 0.0000 
R-squared 0.540810 Mean dependent var. 4.793330 
Adjusted R-squared 07524345 S.D. dependent var. 0.834277 
S.E. of regression 02975382 Akaike info criterion 1.769997 
Sum squared resid. 83.09731 Schwarz criterion 1.906569 
Log likelihood -220.9847 Hannan-Quinn criter. 1.824895 
F-statistic 32.84603 Durbin-Watson stat. 1.856488 


Prob (F-statistic) 0.000000 


Although the interaction dummies show that there is some interaction between gender and the level of 
education, the effect is not statistically significant, for all the interaction coefficients are not individually 
Statistically significant. 

Interestingly, if we drop the education dummies but retain the interaction dummies, we obtain the following 
results: 

Dependent Variable: LOG(WI) 
Method: Least Squares 
Sample: 1 261 

Included observations: 261 


t-Statistic Prob. 


Coefficient Sta Eeror 
ic 3.836483 0.106785 35792725 0.0000 
AGE 0025990 0.003170 8.197991 0.0000 
Dsex -0.868617 0.106429 -8.161508 0.0000 
Dsgx* DE2 0.200823 0.259511 OWT ne 851 0.4397 
Dsex* DE3 (PALS VAY Oaza5020 2.925140 0.0038 
Dspx* DE, 07752652 0265975 2829789 0.0050 
DPT 0.627272 0.078869 77953382 0.0000 
R~squared 0.514449 Mean dependent var. S510 
Adjusted R-squared 0.502979 S.D. dependent var. Omes4277 
S.E. of regression 0.588163 Akaike info criterion 1.802828 
Sum squared resid. 87.86766 Schwarz criterion 1.898429 
Log likelihood - -228.2691 Hannan-Quinn criter. 1.841257 
F-statistic 44.85284 Durbin-Watson stat. AL (557) Siena 


Prob (F-statistic) 0.000000 
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It now seems that education dummies by themselves have no effect on weekly wages, but introduced in an 
interactive format they seem to. As this exercise shows, one must be careful in the use of dummy variables. It 
is left as an exercise for the reader to find out if the education dummies interact with DPT. 


Summary and Conclusions 


1. Dummy variables, taking values of 1 and zero (or their linear transforms), are a means of introducing 
qualitative regressors in regression models. 

2. Dummy variables are a data-classifying device in that they divide a sample into various subgroups 
based on qualities or attributes (gender, marital status, race, religion, etc.) and implicitly allow one to 
run individual regressions for each subgroup. If there are differences in the response of the regressand 
to the variation in the qualitative variables in the various subgroups, they will be reflected in the differ- 
ences in the intercepts or slope coefficients, or both, of the various subgroup regressions. 

3. Although a versatile tool, the dummy variable technique needs to be handled carefully. First, if the 
regression contains a constant term, the number of dummy variables must be one less than the number 
of classifications of each qualitative variable. Second, the coefficient attached to the dummy variables 
must always be interpreted in relation to the base, or reference, group—that is, the group that receives 
the value of zero. The base chosen will depend on the purpose of research at hand. Finally, if a model 
has several qualitative variables with several classes, introduction of dummy variables can consume 
a large number of degrees of freedom. Therefore, one should always weigh the number of dummy 
variables to be introduced against the total number of observations available for analysis. 

4. Among its various applications, this chapter considered but a few. These included (1) comparing two 
(or more) regressions, (2) deseasonalizing time series data, (3) interactive dummies, (4) interpretation 
of dummies in semilog models, and (4) piecewise linear regression models. 

5. We also sounded cautionary notes in the use of dummy variables in situations of heteroscedasticity and 
autocorrelation. But since we will cover these topics fully in subsequent chapters, we will revisit these 
topics then. 


Multiple Choice Questions 


1. Dummy variables are variables of the type 
a. Ratio scale 
b. Interval scale 
c. Ordinal scale 
d. Nominal scale 
2. Dummy variables classify the data into 
a. Inclusive categories 
b. Mutually exclusive categories 
c. Qualitative categories 
d. Quantitative categories 
3. If a quantitative variable has ‘m’ categories, we can introduce 
a. Only ‘m—1’ dummy variables 
b. Only ‘m’ dummy variables 


10. 
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c. Only ‘m+1’ dummy variables 
d. Only ‘m*2’ dummy variables 


. We are trying to estimate the differentials in average annual salary of professors for three categories 


in India—those employed at a fully government aided college (D,,); those employed at partially 
government aided colleges (D,;) and those employed at private colleges (D;;). Which of the following 
is NOT a correct functional form? 

a. Y; = By + B\D,; + B,D); + U; 

b. Y; = B,D); + BDz; + B,D3; + U; 

c. Y; = Bo + B,D; + BD; + B,D3; + U; 

d. LnY, = Bo + B,D; + ByD>; + U; 
For question (4+) above. given Y, = Bı + B.D, + B,D; + u, B, represents the mean annual salary of 
professors working in 

a. Fully government aided colleges 

b. Partially government aided colleges 

c. Private colleges 

d. All three colleges 
For question (4) above, mean annual salary of professors working in fully government aided colleges 
is given by 

a. B; 

b. B, +B, 

c. By +B; 

d. B, +B; 


. In trying to test that females earn less than their male counterparts we estimate the following model: 


Y = B, + B.D;. where Y = average earnings per day in Rs, D = 1 for females and 0 otherwise. B, here 
refers to the 

a. Average earnings of male 

b. Average earnings of female 

c. Differential intercept coefficient for male earnings 

d. Differential intercept coefficient for female earnings 


. For question (7) above, assuming that our assumption is correct about the earnings of male and female 


workers, we expect the coefficients 8, and £, to have 
a. Negative and positive signs respectively 
b. Positive and negative signs respectively 
c. Positive signs 
d. Negative signs 
Holding other factors constant, generally females are found to earn less than their male counterparts. 
Regressing earnings of individuals on a constant, and binary variables—*Male’ that takes value 1 for 
males and 0 otherwise and ‘Female’ that takes the value | for female and 0 otherwise, you would expect 
a. The coefficient for males to have positive sign and females to have negative sign 
b. Both coefficients to be the same distance from the constant, one above and the other below 
c. None of the OLS estimators to exist because there is perfect multicollinearity 
d. This to yield a difference in means statistics 
Dummy variables can take 
a. Only 0 and 1 
b. Any positive value 
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c. Any linear transformation of 0 and 1 such that C= a + bD1 where b #0 & D1 is either 0 or 1. 

d. Any integer value 
For the regression data given in Example 9.1, if we estimate the regression model Y; = B,Dj; + ByD); 
+33D3; +B4D4; + u; where Y, = monthly per capita expenditure on food, D, = 1 for urban area and 0 
otherwise; D, = 1 for rural area and 0 otherwise; D; = 1 for Developed states and 0 otherwise; D, = 1 
for less developed states ( BOMARU” ) and 0 otherwise. The average literacy rate in rural area is given 
by i 

a. Bı 

b. Bo 

c. Bs 

d. By 
For question (11) above, the differential in literacy rate for rural and urban area is given by 

a. B 

b. B, +B, 

c. B,-B, 

d. P+ Bı 


For question (11) above, the average literacy rate for less developed states is given by 


b. B, +B, 
c. By 
d. P3- By 


For question (11) above, the average literacy rate differentials for developed and less developed states 
is given by 


a. Bs 
b. B3+ By 
c. P4 
d. P3- P4 


ANCOVA models include regressors that are 
a. Only quantitative variables 
b. Only qualitative variables 
c. Only categorical variables ~ 
d. Both qualitative and quantitative variables 
ANOVA models include 
a. Only quantitative variables 
b. Only qualitative variables 
c. Only categorical variables 
d. Both qualitative and quantitative variables 


. The process of removing the seasonal component from a time series sample data is known as 


a. Seasonalization 

b. Seasonality 

c. Deseasonalization 

d. Seasonal trend testing 


23BOMARU refers to Bihar, Orissa, Madhya Pradesh, Assam, Rajasthan and Uttar Pradesh. See Shovan Ray (edt) “Backwaters of Devel- 
opment—Six Deprived States of India”(2010), Oxford University Press. 
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Given regression model as Y, = a, + BX; + B(X,- X*)D, + u;. Such a model is used to 

a. Deseasonalise a quarterly time series data 

b. Seasonally adjust a monthly time series data 

c. To conduct chow test for structural break in time series data 

d. Test piecewise linear relationship in the data. 
In question (18) above, X* refers to 

a. 95% critical value of X 

b. 5% critical value of t 

c. Predetermined threshold level for piecewise linear relationship 

d. Predetermined structural break for chow test 
The regression model Y, = a, + aD, + B,X,+ B,(X,D,) + u, is used to 

a. Deseasonalize a time series data 

b. Analyze the seasonal trend in data 

c. Analyze structural break in data 

d. Analyze piecewise linear relationship in data 
The regression model given as Y, = @,D,, + @,D,, + a3D3, + a4D4, + asDs, + U,, where Y = NSE Index; 
D, = 1 for Monday, 0 otherwise: similarly D, to D; = 1 for Tuesday, Wednesday, Thursday and Friday 
respectively and 0 otherwise. Such a model may be used to analyze 

a. Seasonality in weekly data 

b. Deseasonalize daily data 

c. Trend in NSE index 

d. Structural break in weekly data 
In the regression model given in question (20) if we find a, to be statistically insignificant, this means 
that between the two sample periods analyzed 

a. The intercept coefficient is statistically different from each other 

b. The intercept coefficient is not statistically different from each other 

c. The slope coefficient is statistically different from each other 

d. The slope coefficient is not statistically different from each other 
In the regression model given in question (20), if we find a, and £; to be statistically significant but not 
a, and f., this means that 

a. There is no structural break in the data 

b. There is structural break in the data and it is due to intercept term 

c. There is structural break in the data and it is due to slope coefficients 

d. There is structural break in the data and it is due to both intercept and slope coefficients 
Given the model Y, = a, + œD; + a3,D3, + a, (Dz; + D3;) + BX; + U;, the mean value of Y, when both 
dummy variables take the value 1 is given by 

a. Qj +a, +a + a4 

b. (œ; + a, + a3 + &4)PX; 

C. Ay + My 

d. (a, + a,4)BX; 
Chow-test without dummy variables does NOT tell us 

a. That the two regressions are different 

b. There is significant structural break in the data 

c. The source of difference between the two regressions 

d. The estimates of the two regression function are statistically significant 
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Questions 


9.1. If you have monthly data over a number of years, how many dummy variables will you introduce to 
test the following hypotheses: ` 
a. All the 12 months of the year exhibit seasonal patterns. 
b. Only February, April, June, August, October, and December exhibit seasonal patterns. 

9.2. Consider the following regression results (f ratios are in parentheses): 


A 


Y; = 1286 + 104.97X;— 0.0263; + 1.20X%4;+  0.69X5; 


t = (4.67) (3.70)  (—3.80) (0.24) (0.08) 
~19.47X6; + 266.06X3; — Ra 1061: 
(—0.40) (6.94) (—3.04)  (—6.14) 


R? = 0.383 n = 1543 


where Y = wife’s annual desired hours of work, calculated as usual hours of work per year plus weeks 

looking for work 

X= after-tax real average hourly earnings of wife 

X, = husband’s previous year after-tax real annual earnings 

X4= wife’s age in years 

X; = years of schooling completed by wife l , 

X= attitude variable, 1 = if respondent felt that it was all right for a woman to work if she 
desired and her husband agrees, 0 = otherwise 

X; = attitude variable, 1 = if the respondent’s husband favored his wife’s working. 0 = otherwise 

X; = number of children less than 6 years of age 

Xo = number of children in age groups 6 to 13 

a. Do the signs of the coefficients of the various nondummy regressors make economic sense? J ustify 
your answer. 

b. How would you interpret the dummy variables, X, and X}? Are these dummies statistically 
significant? Since the sample is quite large, you may use the “2-r° rule of thumb to answer the 
question. 

c. Why do you think that age and education variables are not significant factors in a woman’s labor 
force participation decision in this study? 

9.3. Consider the following regression results.” (The actual data are in Table 9.9.) 


UN; = 27491 + 1.1507D,— 1.5294, _ 0.8511(D,¥) 
t = (26.896) (3.6288)  (—12.5552)  (—1.9819) 
R? = 0.9128 


where UN = unemployment rate, % 
V = job vacancy rate, % 


“Jane Leuthold, “The Effect of Taxation on the Hours Worked by Married Women,” Industrial and Labor Relations Review, no. 
4, july 1978, pp. 520-526 (notation changed to suit our format). 


“Damodar Gujarati, “The Behaviour of Unemployment and Unfilled Vacancies: Great Britain, 1958-1971," The Economic 
Journal, vol. 82, March 1972, pp. 195-202. 
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Table 9.9 Data Matrix for Regression, in Exercise 9,3 


Unem- Job Unem- Job 
Year ployment Vacancy Year ployment Vacancy 
and Rate UN, Rate V, and Rate UN, Rate V, 
Quarter % % D DV Quarter % % D DV 
1958-IV 1.915 0.510 0 (0) 1965- 1.201 0.997 0 0 
1959-1 1.876 0.541 0 0 =i 1.192 1.035 0 0 
-Il 1.842 0.541 0 0 -lill 1.259 1.040 0 0 
iy 1.750 0.690 0 0 -IV 1.192 1.086 0 0 
-lV 1.648 0.771 0 0 1966- 1.089 1.101 0 0 
1960-1 1.450 0.836 0 o0 “Il 1.101 1.058 020 
=i 1.393 0.908 0 0 -lll = 1.243 0.987 ORO 
-Ill 1.322 0.968 0 0 -lV 1.623 0.819 T O:819 
-IV 1.260 0.998 0 0 1967- 1.821 0.740 1 0.740 
1961-— 1.171 0.968 0 0 -li 1.990 0.661 1 0.661 
İl 1.182 0.964 0 0 -lll 2.114 0.660 1 0.660 
iii 1.221 0.952 0 0 ZM .2.115 0.698 1 0.698 
-IV 1.340 0.849 0 0 1968- 2.150 0.695 1 0.695 
1962- 1.411 0.748 0 0 A 2.141 0.732 | T0782 
an 1.600 0.658 0 0 —Ill 2.167 0.749 1 0.749 
ajj 1.780 0.562 0 0 -IV 2.107 0.800 1 0.800 
-IV 1.941 0.510 0 0 1969- 2.104 0.783 1 0.783 
1963- 2.178 0.510 0 0 -il 2.056 0.800 1 0.800 
il 2.067 0.544 0 0 -lll 2.170 0.794 1 0.794 
i 1.942 0.568 0 0 -IV 2.161 0.790 1 0.790 
-IV 1.764 0.677 0 0 1970-l 2225 0.757 1 0.757 
19641 1.532 0.794 0 0 -ii 2.241 0.746 1 (0.746 
-il 1.455 0.838 0 0 -lll 2.366 0.739 1 0.739 
Ill 1.409 0.885 0 0 -IV 2.324 0.707 1 0707 
-IV 1.296 0.978 0 0 1971- 25160:5834 + OBS" 
-ll 2.969* 0.524* 1 0.524* 
*Preliminary estimates. 
Source: Damodar Gujarati, “The Behaviour of Unemployment anu Unfilled Vacancies: Great Britain, 1958-1971," The Economic Journal, vol. 82, March 


1972, p. 202. 


D = 1, for period beginning in 1966-IV 
= 0, for period before 1966—-IV 
t = time, measured in quarters 

Note: In the fourth quarter of 1966, the (then) Labor government liberalized the National Insurance 
Act by replacing the flat-rate system of short-term unemployment benefits by a mixed system of 
flat-rate and (previous) earnings-related benefits, which increased the level of unemployment benefits. 
a. What are your prior expectations about the relationship between the unemployment and vacancy 

rates? 
b. Holding the job vacancy rate constant, what is the average unemployment rate in the period 

beginning in the fourth quarter of 1966? Is it statistically different from the period before 1966 

fourth quarter? How do you know? 
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c. Are the slopes in the pre- and post-1966 fourth quarter statistically different? How do you know? 
d. Is it safe to conclude from this study that generous unemployment benefits lead to higher 
unemployment rates? Does this make economic sense? 
9.4. From annual data for 1972-1979, William Nordhaus estimated the following model to explain the 
OPEC’s oil price behavior (standard errors in parentheses). 


Yı = 0.3x1z + 5.22x4 
se = (0.03) (0.50) 


where y = difference between current and previous year’s price (dollars per barrel) 
x, = difference between current year’s spot price and OPEC’s price in the previous year 
X= 1 for 1974 and 0 otherwise 
Interpret this result and show the results graphically. What do these results suggest about OPEC’s 
monopoly power? 
9.5. Consider the following model 


VE = Ch + 2D, + PX; Hag 


where Y = annual salary of a college professor 
X = years of teaching experience 
D = dummy for gender 
Consider three ways of defining the dummy variable. 
a. D = 1 for male, 0 for female. 
b. D = 1 for female, 2 for male. 
c. D = 1 for female, —1 for male. i 
Interpret the preceding regression model for each dummy assignment. Is one method preferable to 
another? Justify your answer. 

9.6. Refer to regression (9.7.3). How would you test the hypothesis that the coefficients of D, and D, are 
the same? And that the coefficients of D, and D, are the same? If the coefficient of D; is statistically 
different from that of D, and the coefficient of D, is different from that of D». does that mean that the 
coefficients D, and D, are also different? 

Hint: var (A + B) = var (A) + var (B) + 2 cov (A, B) 

9.7. Refer to the India’s savings—income example discussed in Section 9.5. 

a. How would you obtain the standard errors of the regression coefficients given in Eqs. (9.5.5) and 
(9.5.6), which were obtained from the pooled regression (9.5.4)? 
b. To obtain numerical answers, what additional information, if any, is required? 

9.8. In his study on the labor hours spent by the FDIC (Federal Deposit Insurance Corporation) on 91 bank 

examinations, R. J. Miller estimated the following function: * 


inY = 2.41 + 0.3674 In X; + 0.2217 In X% + 0.0803 In Xy 


(0.0477) (0.0628) (0.0287) 
—0.1755D, + 0.2799D, + 0.5634D; — 0.2572D, 
(0.2905) (0.1044) (0.1657) (0.0787) 


R? = 0.766 


*“Oil and Economic Performance in Industrial Countries,” Brookings Papers on Economic Activity, 1980, pp. 341-388. 


“Examination of Man-Hour Cost for Independent, Joint, and Divided Examination Programs,” Journal of Bank Research, vol. 
11, 1980, pp. 28-35. Note: The notations have been altered to conform with our notations. 
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where Y= FDIC examiner labor hours 
X, = total assets of bank 
X, = total number of offices in bank 
X; = ratio of classified loans to total loans for bank 
D; = 1 if management rating was “good” 
D,= 1 if management rating was “fair” 
D,= 1 if management rating was “satisfactory” 
D,= 1 if examination was conducted jointly with the state 
The figures in parentheses are the estimated standard errors. 
a. Interpret these results. 
b. Is there any problem in interpreting the dummy variables in this model since Y is in the log form? 
c. How would you interpret the dummy coefficients? 
9.9. To assess the effect of the Fed’s policy of deregulating interest rates beginning in July 1979, Sidney 
Langer, a student of mine, estimated the following model for the quarterly period of 1975-III to 
1983-II.” 


f, = 8.5871 — 0.1328P, — 0.7102Un, — 0.2389M, 
se = (1.9563) (0.0992) (0.1909) (0.0727) 


+ 0.6592Y,; + 2.5831Dum, R? = 0.9156 
(0.1036) (0.7549) 


where Y = 3-month Treasury bill rate 
P =expected rate of inflation 
Un = seasonally adjusted unemployment rate Y 
M = changes in the monetary base 
Dum = dummy, taking value of 1 for observations 
beginning July 1, 1979 
a. Interpret these results. 
b. What has been the effect of interest rate deregulation? Do 
the results make economic sense? 
c. The coefficients of P, Un, and M, are negative. Can you 
offer an economic rationale? 
9.10. Refer to the piecewise regression discussed in the text. 
Suppose there not only is a change in the slope coefficient 


at X but also the regression line jumps, as shown in Figure X 

9.7. How would you modify Eq. (9.8.1) to take into account x* 

the jump in the regression line at X*? Figure 9.7 Discontinuous piecewise 
9.11. Determinants of price per ounce of cola. Cathy Schaefer, linear regression. 


a student of mine, estimated the following regression from 
. . ** 
cross-sectional data of 77 observations: 


Ja = r EDn r Po D2; + 63 Dai a 


where P; = price per ounce of cola 


“Sidney Langer, “Interest Rate Deregulation and Short-Term Interest Rates,” unpublished term paper. 
“Cathy Schaefer, “Price Per Ounce of Cola Beverage as a Function of Place of Purchase, Size of Container, and Branded or 
Unbranded Product,” unpublished term project. 
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D,; = 001 if discount store 
= 010 if chain store 
= 100 if convenience store 

D,; = 10 if branded good 
= 01 if unbranded good 

D,; = 0001 if 67.6 ounce (2 liter) bottle 
= 0010 if 28-33.8 ounce bottles (Note: 33.8 oz = | liter) 
= 0100 if 16-ounce bottle 
= 1000 if 12-ounce can 

The results were as follows: 


A 


P; = 0.0143 — 0.000004D); + 0.0090D2; + 0.00001D3; 


se = (0.00001) (0.00011) (0.00000) 
t= (—0.3837) (8.3927) (5.8125) 
R? = 0.6033 


Note: The standard errors are shown only to five decimal places. 

a. Comment on the way the dummies have been introduced in the model. 

b. Assuming the dummy setup is acceptable, how would you interpret the results? 

c. The coefficient of D, is positive and statistically significant. How do you rationalize this result? 
From data for 101 countries on per capita income in dollars (X) and life expectancy in years (Y) in the 
early 1970s, Sen and Srivastava obtained the following regression results: 


f; = —2.40 + 9.39 In.X; — 3.36 [D,(In X; — 7)] 
se= (4.73) (0.859) (2.42) R? = 0.752 


where D, = 1 if In X; > 7, and D; = 0 otherwise. Note: When In X; = 7, X = $1,097 (approximately). 

a. What might be the reason(s) for introducing the income variable in the log form? 

b. How would you interpret the coefficient 9.39 of In X;? 

c. What might be the reason for introducing the regressor D; (In X; — 7)? How do you explain this 
regressor verbally? And how do you interpret the coefficient —3.36 of this regressor (Hint: linear 
piecewise regression)? 

d. Assuming per capita income of $1,097 as the dividing line between poorer and richer countries. 
how would you derive the regression for countries whose per capita is less than $1.097 and the 
regression for countries whose per capita income is greater than $1,097? 

e. What general conclusions do you draw from the regression result presented in this problem? 

Consider the following model: 


Y; = Bi + B.D; + ui 
where D, = 0 for the first 20 observations and D, = 1 for the remaining 30 observations. You are also 
told that var (u?) = 300. 
a. How would you interpret 6, and B,? 
b. What are the mean values of the two groups? 
c. How would you compute the variance of ( Bi + Ê» )? Note: You are given that the cov ( Ê ie b>) =a). 


“Ashish Sen and Muni Srivastava, Regression Analysis: Theory, Methods, and Applications, Springer Verlag, New York, 1990, 
p. 92. Notation changed. 


Dummy Variable Regression Models 329 


9.14. To assess the effect of state right-to-work laws (which do not require membership in the union as a 
precondition of employment) on union membership, the following regression results were obtained, 
from the data for 50 states in the United States for 1982: 


PVT; = 19.8066 — 9.3917 RTW, 
t = (17.0352) (—5.1086) 
12 = 0.3522 


where PVT = percentage of private sector employees in unions, 1982, and RTW = 1 if right-to-work 
law exists, 0 otherwise. Note: In 1982, twenty states had right-to-work laws. 
a. A priori, what is the expected relationship between PVT and RTW? 
b. Do the regression results support the prior expectations? 
c. Interpret the regression results. 
d. What was the average percent of private sector employees in unions in the states that did not have 
the right-to-work laws? 
9.15. In the following regression model: 


Y; = Bi + oD; + ui 

Y represents hourly wage in dollars and D is the dummy variable, taking a value of 1 for a college 
graduate and a value of 0 for a high-school graduate. Using the OLS formulas given in Chapter 
3, show that fi = Yhg and 62 = Yeg — Yng, where the subscripts have the following meanings: 
hg = high-school graduate, cg = college graduate. In all, there are n, high-school graduates and n 
college graduates, for a total sample of n =n, + np. 

9.16. To study the rate of growth of population in Belize over the period 1970-1992, Mukherjee et al. 
estimated the following models: 


Modell:  In(Pop,= 4.73 + 0.0241 
t= (781.25) (54.71) 

Model ll: In(Pop,= 4.77 + 0.015t— 0.075D,+  0.011(D,) 
t = (2477.92) (34.01) (—17.03) (25.54) 


where Pop = population in millions, t = trend variable, D, = 1 for observations beginning in 1978 and 

0 before 1978, and In stands for natural logarithm. 

a. In Model I, what is the rate of growth of Belize’s. population over the sample period? 

b. Are the population growth rates statistically different pre- and post-1978? How do you know? If 
they are different, what are the growth rates for 1972-1977 and 1978-1992? 


Empirical Exercises 


9.17. Using the data given in Table 9.9, test the hypothesis that the error variances in the two subperiods 
1958-IV to 1966—III and 1966—IV to 1971-1 are the same. 

9.18. Using the methodology discussed in Chapter 8, compare the unrestricted and restricted regressions 
(9.7.3) and (9.7.4); that is, test for the validity of the imposed restrictions. 


“The data used in the regression results were obtained from N. M. Meltz, “Interstate and Interprovincial Differences in Union 
Density,” Industrial Relations, vol. 28, no. 2, 1989, pp. 142-158. 

“Chandan Mukherjee, Howard White, and Marc Wuyts, Econometrics and Data Analysis for Developing Countries, Routledge, 
London, 1998, pp. 372-375. Notations adapted. 
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In the Indian savings—income regression (9.5.4) discussed in the chapter, suppose that instead of using 
| and 0 values for the dummy variable you use Z; = a + b D,, where D; = | and 0, a = 2, and b = 3. 
Compare your results, 

Continuing with the savings—income regression (9.5.4), suppose you were to assign D, = 0 to observa- 
tions in the second period and D, = 1 to observations in the first period. How would the results shown 
in Eq. (9.5.4) change? 

Use the data given in Table 9.3 and consider the following model: 


In Savings; = 6ı + Bo In Income; + B3 In D; + ui 


where In stands for natural log and where D; = 1 for 1974-75 to 1988-89 and 10 for 1989-90 to 

1995—96. 

a. What is the rationale behind assigning dummy values as suggested? 

b. Estimate the preceding model and interpret your results. 

c. What are the intercept values of the savings function in the two subperiods and how do you interpret 
them? 

Refer to the quarterly appliance sales data given in Table 9.4. Consider the following model: 


Sales; = a + œz Dz; + a3D3; + &4 D4; + üi 


where the D’s are dummies taking 1 and 0 values for quarters I through IV. 

a. Estimate the preceding model for dishwashers, disposers, and washing machines individually. 

b. How would you interpret the estimated slope coefficients? 

c. How would you use the estimated a’s to deseasonalize the sales data for individual appliances? 

Reestimate the model in Exercise 9.22 by adding the regressor, expenditure on durable goods. 

a. Is there a difference in the regression results you obtained in Exercise 9.22 and in this exercise? If 
so, what explains the difference? 

b. If there is seasonality in the durable goods expenditure data, how would you account for it? 

Table 9.10 gives data on quadrennial presidential elections in the United States from 1916 to 2004." 

a. Using the data given in Table 9.10, develop a suitable model to predict the Democratic share of the 
two-party presidential vote. ` 

b. How would you use this model to predict the outcome of a presidential election? 

c. Chatterjee et al. suggested considering the following model as a trial model to predict presidential 
elections: 


V = Bo + Bil + B2D + B3W + Ba(GI) + BsP + BoN +u 


Estimate this model and comment on the results in relation to the results of the model you have 
chosen. 

Refer to regression (9.6.4). Test the hypothesis that the rate of increase of average literacy rate with 
respect to per capita net state domestic product differs by gender and area of residence. (Hint: Use 
multiplicative dummies.) 

Refer to the regression (9.3.1). How would you modify the model to find out if there is any interaction 
between the gender and the region of residence dummies? Present the results based on this model and 
compare them with those given in Eq. (9.3.1). 


‘These data were originally compiled by Ray Fair of Yale University, who has been predicting the outcome of presidential 
elections for several years. The data are reproduced from Samprit Chatterjee, Ali S. Hadi, and Bertram Price, Regression 
Analysis by Example, 3d ed., John Wiley & Sons, New York, 2000, pp. 150-151 and updated from http://fairmodel.econ. 
yale.edu/rayfair/pdf/2006CHTM.HTM. 
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Table 9.10 U.S. Presidential Elections, 1916-2004 


Obs. Year V W D G I = 


N R 

1 1916 0.5168 0 1 2229 1 3 4.252 
2 1920 0.3612 1 0 —11.46 1 5 16535 
3 1924 0.4176 0 —1 —3.872 —1 10 5.161 
4 1928 0.4118 0 0 4.623 -1 4 0.183 
S 1982 0.5916 0 —1 —14.9 —1 4 7.069 
6 1936 0.6246 0 1 11.921 1 9 2.362 
7 1940 0.55 0 1 3.708 1 8 0.028 
8 1944 0.5377 1 1 4.119 1 14 5.678 
9 1948 0.5237 1 1 1.849 1 5 8.722 
10 1952 0.446 0 0 0.627 1 6 2.288 
11 1956 0.4224 0 —1 —1.527 =i 5 1.936 
12 1960 0.5009 0 0 0.114 —1 5 932 
12 1964 0.6134 0 1 5.054 1 10 1.247 
14 1968 0.496 0 0 4.836 1 7 3.215 
15 1972 0.3821 (0) —1 6.278 —1 4 4.766 
16 1976 0.5105 0 0 3.663. —1 4 7.657 
17 1980 0.447 0 1 —3.789 1 5 8.093 
18 1984 . 0.4083 0 =l 5.387 —1 7 5.403 
19 1988 0.461 0 0 2.068 —1 6 3.202 
20 1992 0.5345 0 =] 2293 —1 1 3.692 
21 1996 0.5474 0 1 2.918 1 3 2.268 
22 2000 0.50265 0 0 1.219 1 8 1.605 
23 2004 0.51233 0 1 2.69 =] 1 


Notes: 

Year Election year 

V Incumbent share of the two-party presidential vote. 

W Indicator variable (1 for the elections of 1920, 1944, and 1948, and 0 otherwise). 

D Indicator variable (1 if a Democratic incumbent is running for election, —1 if a Republican incumbent is running for election, and 0 
otherwise). 

G Growth rate of real per capita GDP in the first three quarters of the election year. 

I Indicator variable (1 if there is a Democratic incumbent at the time of the election and — 1 if there is a Republican incumbent). 

N Number of quarters in the first 15 quarters of the administration in which the growth rate of real per capita GDP is greater than 3.2%. 

P Absolute value of the growth rate of the GDP deflator in the first 15 quarters of the administration. 


9.27. In the model Y, = B; + B,D; + u; let D; = 0 for the first 40 observations and D, = 1 for the remaining 
60 observations. You are told that u; has zero mean and a variance of 100. What are the mean values 
and variances of the two sets of observations?” 

9.28. Refer to the Indian savings—income regression discussed in the chapter. As an alternative to 
Eq. (9.5.1), consider the following model: 


In¥, = Bi + BoD; + B3X; Te Ba(D;X;) + uy; 


where Y is savings and X is income. 

a. Estimate the preceding model and compare the results with those given in Eq. (9.5.4). Which is a 
better model? 

b. How would you interpret the dummy coefficient in this model? 


“This example is adapted from Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p. 347. 
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c. As we will see in the chapter on heteroscedasticity, very often a log transformation of the dependent 
variable reduces heteroscedasticity in the data. See if this is the case in the present example by 
running the regression of log of Y on X for the two periods and see if the estimated error variances 
in the two periods are statistically the same. If they are, the Chow test can be used to pool the data 
in the manner indicated in the chapter. 

9.29. Refer to the Indian wage earners example (Section 9.12) and the data in Table 9.8.” As a reminder, the 
variables are defined as follows: 
WI = weekly wage income in rupees 
Age = age in years 
D,..,= 1 for male workers and 0 for female workers 
DE, = a dummy variable taking a value of 1 for workers with up to a primary education 
DE, = a dummy variable taking a value of 1 for workers with up to a secondary education 
DE, = a dummy variable taking a value of 1 for workers with higher education 
DPT =a dummy variable taking a value of 1 for workers with permanent jobs and a value of 0 
for temporary workers 

The reference category is male workers with no primary education and temporary jobs. 

In Section 9.12, interaction terms were created between the education variables (DE,, DE3, 
and DE,) and the gender variable (Dex), What happens if we create interaction terms between the 
education dummies and the permanent worker dummy variable (DPT)? 

a. Estimate the model predicting In WI containing age, gender, the education dummy variables, and 
three new interaction terms: DE, X DPT, DE, X DPT, and DE, X DPT. Does there appear to be a 
significant interaction effect among the new terms? 

b. Is there a significant difference between workers with an education level up to primary and those 
without a primary education? Assess this with respect to both the education dummy variable and 
the interaction term and explain the results. What about the difference between workers with a 
secondary level of education and those without a primary level of education? What about the 
difference between those with an education level beyond secondary, compared to those without a 
primary level of education? 

c. Now assess the results of deleting the education dummies from the model. Do the interaction terms 
change in significance? 


Key to Multiple Choice Questions 


1. (d) 2. (b) 3. (a) 4. (c) 5O 6. (b) 7. (d) 8. (b) 9E) 
10. (c) 11. (b) IZA) 15C) 14. (d). 15. (d) 16. (a) 17. (c) 18. (d) 
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Appendix 9A 


Semitogarithmic Regression with Dummy Regressor 
In Section 9.10 we noted that in models of the type 
In Y; = By + B2D; (1) 


“The data come from Econometrics and Data Analysis for Developing Countries, by Chandan Mukherjee, Howard White, and 
Marc Wuyts, Routledge Press, London, 1998, in the Appendix. 
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the relative change in Y (i.e., semielasticity), with respect to the dummy regressor taking values of 1 or 0, can be obtained 
as (antilog of estimated B,) — 1 times 100, that is, as 


(e — 1) x 100 (2) 
The proof is as follows: Since In and exp (= e) are inverse functions, we can write Eq. (1) as: 
In Y; = fı + ln(e?!) (3) 


Now when D = 0, ef): = 1 and when D = 1, ef?) = ef. Therefore, in going from state 0 to state 1, In Y, changes 
by (ef — 1), But a change in the log of a variable is a relative change, which after multiplication by 100 becomes a 
percentage change. Hence the percentage change is (e*? — 1) x 100, as claimed. (Note: In, e = 1, that is, the log of e to 
base e is 1, just as the log of 10 to base 10 is 1. Recall that log to base e is called the natural log and that log to base 10 
is called the common log.) 


RELAXING THE 
ASSUMPTIONS OF THE 
CLASSICAL MODEL 


In Part 1 we considered at length the classical normal linear regression model and showed how it can be used 
to handle the twin problems of statistical inference, namely, estimation and hypothesis testing, as well as the 
problem of prediction. But recall that this model is based on several simplifying assumptions, which are as 


follows. 


Assumption 1. 
Assumption 2. 


Assumption 3. 
Assumption 4. 
Assumption 5. 
Assumption 6. 


Assumption 7. 


The regression model is linear in the parameters. 

The values of the regressors, the X’s, are fixed, or X values are independent of the error 
term. Here, this means we require zero covariance between u and each X variable. 

For given X’s, the mean value of disturbance u; is zero. 

For given X’s, the variance of u; is constant or homoscedastic. 

For given X’s, there is no autocorrelation, or serial correlation, between the disturbances. 
The number of observations n must be greater than the number of parameters to be 
estimated. 

There must be sufficient variation in the values of the X variables. 


We are also including the following 3 assumptions in this part of the text: 


Assumption 8. 
Assumption 9. 


There is no exact collinearity between the X variables. 
The model is correctly specified, so there is no specification bias. 


Assumption 10. The stochastic (disturbance) term u; is normally distributed. 


Before proceeding further, let us note that most textbooks list fewer than 10 assumptions. For example, 
assumptions 6 and 7 are taken for granted rather than spelled out explicitly. We decided to state them explicitly 
because distinguishing between the assumptions required for ordinary least squares (OLS) to have desirable 
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statistical properties (such as BLUE) and the conditions required for OLS to be useful seems sensible. For 
example, OLS estimators are BLUE (best linear unbiased estimators) even if assumption 7 is not satisfied. 
But in that case the standard errors of the OLS estimators will be large relative to their coefficients (i.e., the 
t ratios will be small), thereby making it difficult to assess the contribution of one or more regressors to the 
explained sum of squares. 

As Wetherill notes, in practice two major types of problems arise in applying the classical linear regression 
model: (1) those due to assumptions about the specification of the model and about the disturbances u; and (2) 
those due to assumptions about the data.! In the first category are Assumptions 1, 2, 3, 4, 5, 9, and 10. Those 
in the second category include Assumptions 6, 7, and 8. In addition, data problems, such as outliers (unusual 
or untypical observations) and errors of measurement in the data, also fall into the second category. 

With respect to problems arising from the assumptions about disturbances and model specifications, three 
major questions arise: (1) How severe must the departure be from a particular assumption before it really 
matters? For example, if u; are not exactly normally distributed, what level of departure from this assumption 
can one accept before the BLUE property of the OLS estimators is destroyed? (2) How do we find out 
whether a particular assumption is in fact violated in a concrete case? Thus. how does one find out if the 
disturbances are normally distributed in a given application? We have already discussed the Anderson- 
Darling A, statistic and Jarque—Bera tests of normality. (3) What remedial measures can we take if one or 
more of the assumptions are false? For example, if the assumption of homoscedasticity is found to be false in 
an application, what do we do then? 

With regard to problems attributable to assumptions about the data, we also face similar questions. (1) 
How serious is a particular problem? For example, is multicollinearity so severe that it makes estimation and 
inference very difficult? (2) How do we find out the severity of the data problem? For example, how do we 
decide whether the inclusion or exclusion of an observation or observations that may represent outliers will 
make a tremendous difference in the analysis? (3) Can some of the data problems be easily remedied? For 
example, can one have access to the original data to find out the sources of errors of measurement in the data? 

Unfortunately, satisfactory answers cannot be given to all these questions. In the rest of Part 2 we will 
look at some of the assumptions more critically, but not all will receive full scrutiny. In particular, we will not 
discuss in depth the following: Assumptions 2, 3, and 10. The reasons are as follows: 


Assumption 2: Fixed versus Stochastic Regressors 


Remember that our regression analysis is based on the assumption that the regressors are ngnstochastic and 
assume fixed values in repeated sampling. There is a good reason for this strategy. Unlike scientists in the 
physical sciences, as noted in Chapter 1, economists generally have no control over the data they use. More 
often than not, economists depend on secondary data, that is, data collected by someone else, such as the 
government and private organizations. Therefore, the practical strategy to follow is to assume that for the 
problem at hand the values of the explanatory variables are given even though the variables themselves may 
be intrinsically stochastic or random. Hence, the results of the regression analysis are conditional upon these 
given values. 

But suppose that we cannot regard the X’s as truly nonstochastic or fixed. This is the case of random 
or stochastic regressors. Now the situation is rather involved. The u, by assumption, are stochastic. If the 
X’s too are stochastic, then we must specify how the X’s and u; are distributed. If we are willing to make 
Assumption 2 (i.e., the X’s, although random, are distributed independently of, or at least uncorrelated with, 


u;), then for all practical purposes we can continue to operate as if the X’s were nonstochastic. As Kmenta 
notes: 


1G. Barrie Wetherill, Regression Analysis with Applications, Chapman and Hall, New York, 1986, pp. 14-15. 
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Thus, relaxing the assumption that X is nonstochastic and replacing it by the assumption that X is stochastic but 
independent of [u] does not change the desirable properties and feasibility of least squares estimation. 


sae al we will retain Assumption 2 until we come to deal with simultaneous equations models in 
Part 4.” Also, a brief discussion of nonstochastic regressors will be given in Chapter 13. 


Assumption 3: Zero Mean Value of u, 
Recall the k-variable linear regression model: 
Y; = By + BoX2; + B3X3j +--+ + pkXki + ui (1) 
Let us now assume that 
E(u;|Xo;, X3,..., Xu) =W (2) 


where w is a constant; note in the standard model w = 0, but now we let it be any constant. 
Taking the conditional expectation of Eq. (1), we obtain 


E(Y;i| Xz, X3i, - -< Xki) = Bi + BoXai + B3X3i + +--+ BeXe +w 
= (Bi +w) + BoX2; + B3X3i +--+ + BX (3) 
= a + BX; + B3X3; +--+ + PkXki 


where a = (B, + w) and where in taking the expectations one shouid note that the X’s are treated as constants. 
(Why?) 

Therefore, if Assumption 3 is not fulfilled, we see that we cannot estimate the original intercept B,; what 
we obtain is a, which contains B, and E(u,) = w. In short, we obtain a biased estimate of B,. 

But as we have noted on many occasions, in many practical situations the intercept term, £}, is of little 
importance; the more meaningful quantities are the slope coefficients, which remain unaffected even if 
Assumption 3 is violated.* Besides, in many applications the intercept term has no physical interpretation. 


Assumption 10: Normality of u 


This assumption is not essential if our objective is estimation only. As noted in Chapter 3, the OLS estimators 
are BLUE regardless of whether the u; are normally distributed or not. With the normality assumption, 
however, we were able to establish that the OLS estimators of the regression coefficients follow the normal 
distribution, that (n — k)@?/o? has the y? distribution, and that one could use the t and F tests to test various 
statistical hypotheses regardless of the sample size. 

But what happens if the u; are not normally distributed? We then rely on the following extension of 
the central limit theorem; recall that it was the central limit theorem we invoked to justify the normality 
assumption in the first place: 


2Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 338. (Emphasis in the original.) 


3A technical point may be noted here. Instead of the strong assumption that the X’s and u are independent, we may use the 
weaker assumption that the values of X variables and u are uncorrelated contemporaneously (i.e., at the same point in time). 
In this case OLS estimators may be biased but they are consistent, that is, as the sample size increases indefinitely, the 
estimators converge on their true values. If, however, the X's and u are contemporaneously correlated, the OLS estimators 
are biased as well as inconsistent. In Chapter 17 we will show how the method of instrumental variables can sometimes 
be used to obtain consistent estimators in this situation. 

4it is very important to note that this statement is true only if E(u) = w for each i. However, if E(u;) = w;, that is, a different 
constant for each i, the partial slope coefficients may be biased as well as inconsistent. In this case violation of Assumption 
3 will be critical. For proof and further details, see Peter Schmidt, Econometrics, Marcel Dekker, New York, 1976, pp. 36-39. 
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If the disturbances [u,] are independently and identically distributed with zero mean and [constant] variance o° and 
if the explanatory variables are constant in repeated samples, the [O]LS coefficient estimators are asymptotically 
normally distributed with means equal to the corresponding B's 


Therefore, the usual test procedures—the ¢ and F tests—are still valid asymptotically, that is, in the large 
sample, but not in the finite or small samples. 

The fact that if the disturbances are not normally distributed the OLS estimators are still normally 
distributed asymptotically (under the assumption of homoscedastic variance and fixed X’s) is of little comfort 
to practicing economists, who often do not have the luxury of large-sample data. Therefore, the normality 
assumption becomes extremely important for the purposes of hypothesis testing and prediction. Hence, with 
the twin problems of estimation and hypothesis testing in mind, and given the fact that small samples are the 
rule rather than the exception in most economic analyses, we shall continue to use the normality assumption.° 
(But see Chapter 13, Section 13.12.) 

Of course, this means that when we deal with a finite sample, we must explicitly test for the normality 
assumption. We have already considered the Anderson—Darling and the Jarque—Bera tests of normality. 
The reader is strongly urged to apply these or other tests of normality to regression residuals. Keep in mind 
that in finite samples without the normality assumption the usual ź and F statistics may not follow the r and 
F distributions. 

We are left with Assumptions 1, 4, 5, 6, 7, 8, and 9. Assumptions 6, 7, and 8 are closely related and are 
discussed in the chapter on multicollinearity (Chapter 10). Assumption 4 is discussed in the chapter on 
heteroscedasticity (Chapter 11). Assumption 5 is discussed in the chapter on autocorrelation (Chapter 12). 
Assumption 9 is discussed in the chapter on model specification and diagnostic testing (Chapter 13). Because 
of its specialized nature and mathematical demands, Assumption | is discussed as a special topic in Part 3 
(Chapter 14). : > 

For pedagogical reasons, in each of these chapters we follow a common format, namely, (1) identify the 
nature of the problem, (2) examine its consequences, (3) suggest methods of detecting it, and (4) consider 
remedial measures so that they may lead to estimators that possess the desirable statistical properties discussed 
in Part 1. 

A cautionary note is in order: As noted earlier, satisfactory answers to all the problems arising out of the 
violation of the assumptions of the classical linear regression model (CLRM) do not exist. Moreover, there 
may be more than one solution to a particular problem, and often it is not clear which method is best. Besides, 
in a particular application more than one violation of the CLRM may be involved. Thus, specification bias, 
multicollinearity, and heteroscedasticity may coexist in an application, and there is no single omnipotent 
test that will solve all the problems simultaneously.” Furthermore, a particular test that was popular at one 
time may not be in vogue later because somebody found a flaw in the earlier test. But this is how science 
progresses. Econometrics is no exception. 


Henri Theil, Introduction to Econometrics, Prentice-Hall, Englewood Cliffs, N}, 1978, p. 240. It must be noted the assump- 
tions of fixed X’s and constant g, are crucial for this result. 


In passing, note that the effects of departure from normality and related topics are often discussed under the topic of 
robust estimation in the literature, a topic beyond the scope of this book. 


’This is not for lack of trying. See A. K. Bera and C. M. Jarque, “Efficient Tests for Normality, Homoscedasticity and Serial 
Independence of Regression Residuals: Monte Carlo Evidence,” Economic Letters, vol. 7, 1981, pp. 313-318. 


CHAPTER 


Multicollinearity: 
What Happens If the 
Regressors are Correlated? 


There 1s no pair of words that is more misused both in econometrics texts and in the applied literature than the 
pair “multi-collinearity problem.” That many of our explanatory variables are highly collinear is a fact of life. 
And it is completely clear that there are experimental designs X’X [i.e., data matrix] which would be much 
preferred to the designs the natural experiment has provided us [i.e., the sample at hand]. But a complaint about the 
apparent malevolence of nature is not at all constructive, and the ad hoc cures for a bad design, such as stepwise 
regression or ridge regression, can be disastrously inappropriate. Better that we should rightly accept the fact that 
our non-experiments [i.e.. data not collected by designed experiments] are sometimes not very informative about 
parameters of interest.’ 


Assumption 8 of the classical linear regression model (CLRM) is that there is no multicollinearity among 
the regressors included in the regression model. In this chapter we take a critical look at this assumption by 
seeking answers to the following questions: 


What is the nature of multicollinearity? 

Is multicollinearity really a problem? 

What are its practical consequences? 

How does one detect it? 

What remedial measures can be taken to alleviate the problem of multicollinearity? 


a <a D m 


In this chapter we also discuss Assumption 6 of the CLRM, namely, that the number of observations 
in the sample must be greater than the number of regressors, and Assumption 7, which requires that there 


‘Edward E. Leamer, “Model Choice and Specification Analysis,” in Zvi Griliches and Michael D. Intriligator, eds., Handbook 
of Econometrics, vol. |, North Holland Publishing Company, Amsterdam, 1983, pp. 300-301. 
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be sufficient variability in the values of the regressors, for they are intimately related to the assumption of 
no multicollinearity. Arthur Goldberger has christened Assumption 6 as the problem of micronumerosity,” 
which simply means small sample size. 


10.1 The Nature of Multicollinearity 


The term multicollinearity is due to Ragnar Frisch.’ Originally it meant the existence of a “perfect,” or 
exact, linear relationship among some or all explanatory variables of a regression model.’ For the k-variable 
regression involving explanatory variables X,, X3, - - . , X, (where X, = 1 for all observations to allow for the 
intercept term), an exact linear relationship is said to exist if the following condition is satisfied: 


AyX1 HA2X2 + -+A X; = 0 (10.1.1) 


where A,, A>,..., A, are constants such that not all of them are zero simultaneously.’ 

Today, however, the term multicollinearity is used in a broader sense to include the case of perfect multi- 
collinearity, as shown by Eq. (10.1.1), as well as the case where the Xvariables are intercorrelated but not 
perfectly so, as follows: 


ATA ERA A + HAGA Ey = 0 (10.1.2) 


where v; is a stochastic error term. 
To see the difference between perfect and less than perfect multicollinearity, assume, for example, that 
A,# 0. Then, Eq. (10.1.1) can be written as 
At À3 Ak ` 
= —— Xj — — X35 — + — — Xr 10.1.3 
re li T 3i E ki ( ) 
which shows how X, is exactly linearly related to other variables or how it can be derived from a linear 
combination of other X variables. In this situation, the coefficient of correlation between the variable X, and 
the linear combination on the right side of Eq. (10.1.3) is bound to be unity. 
Similarly, if A, #0, Eq. (10.1.2) can be written as 


Xni 


Xn == S he re 
i 1 li T 3i 35 ki pE m (10.1.4) 
which shows that X, is not an exact linear combination of other X’s because it is also determined by the 
stochastic error term v;. 


2See his A Course in Econometrics, Harvard University Press, Cambridge, Mass., 1991, p. 249. 


3Ragnar Frisch, Statistical Confluence Analysis by Means of Complete Regression Systems, Institute of Economics, Oslo Univer- 
sity, publ. no. 5, 1934. 


‘Strictly speaking, multicollinearity refers to the existence of more than one exact linear relationship, and collinearity refers to 


the existence of a single linear relationship. But this distinction is rarely maintained in practice, and multicollinearity refers 
to both cases. 


‘The chances of one’s obtaining a sample of values where the regressors are related in this fashion are indeed very small in 
practice except by design when, for example, the number of observations is smaller than the number of regressors or if one 
falls into the “dummy variable trap” as discussed in Chapter 9. See Exercise 10.2. 


“If there are only two explanatory variables, intercorrelation can be measured by the zero-order or simple correlation coef- 
ficient. But if there are more than two X variables, intercorrelation can be measured by the partial correlation coefficients or 
by the multiple correlation coefficient R of one X variable with all other X variables taken together. 
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As a numerical example, consider the following hypothetical data: 


X2 X3 X3 
10 50 52 
15 75 75 
18 90 97 
24 120 129 
30 150 loz 


It is apparent that X;; = 5X,,. Therefore, there is perfect collinearity between X, and X; since the coefficient of 
correlation r», is unity. The variable X¥ was created from X, by simply adding to it the following numbers, 
which were taken from a table of random numbers: 2, 0, 7, 9, 2. Now there is no longer perfect collinearity 
between X) and X¥ However, the two variables are highly correlated because calculations will show that the 
coefficient of correlation between them is 0.9959. 

The preceding algebraic approach to multicollinearity can be portrayed succinctly by the Ballentine (recall 
Figure 3.8, reproduced in Figure 10.1). In this figure the circles Y, X,, and X, represent, respectively, the 
variations in Y (the dependent variable) and X, and X; (the explanatory variables). The degree of collinearity 
can be measured by the extent of the overlap (shaded area) of the X, and X; circles. In Figure 10.1a there is no 


(a) No collinearity (b) Low collinearity 
Ses 
(c) Moderate collinearity (d) High collinearity (e) Very high collinearity 


Figure 10.1 The Ballentine view of multicollinearity. 
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overlap between X, and X;, and hence no collinearity. In Figure 10.16 through 10. le there is a “low” to “high” 
degree of collinearity—the greater the overlap between X, and X; (i.e., the larger the shaded area), the higher 
the degree of collinearity. In the extreme, if X, and X; were to overlap completely (or if X, were completely 
inside X3, or vice versa), collinearity would be perfect. 

In passing, note that multicollinearity, as we have defined it, refers only to linear relationships among the 
X variables. It does not rule out nonlinear relationships among them. For example, consider the following 
regression model: 


Y, = Bo + BiXi + BoX? + pX? + ui (10.1.5) 


where, say, Y = total cost of production and X = output. The variables X 2 (output squared) and X z (output 
cubed) are obviously functionally related to X,, but the relationship is nonlinear. Strictly, therefore, models 
such as Eq. (10.1.5) do not violate the assumption of no multicollinearity. However. in concrete applications, 
the conventionally measured correlation coefficient will show X;, X7, and X? to be highly correlated, which, 
as we Shall show, will make it difficult to estimate the parameters of Eq. (10.1.5) with greater precision (1.e., 
with smaller standard errors). 

Why does the classical linear regression model assume that there is no multicollinearity among the X’s? 
The reasoning is this: If multicollinearity is perfect in the sense of Eq. (10.1.1), the regression coeffi- 
cients of the X variables are indeterminate and their standard errors are infinite. If multicollinearity 
is less than perfect, as in Eq. (10.1.2), the regression coefficients, although determinate, possess large 
standard errors (in relation to the coefficients themselves), which means the coefficients cannot be 
estimated with great precision or accuracy. The proofs of these statements are given in the following 
sections. 

There are several sources of multicollinearity. As Montgomery and Peck note, multicollinearity may be 
due to the following factors:’ 

1. The data collection method employed. For example, sampling over a limited range of the values taken 
by the regressors in the population. 

2. Constraints on the model or in the population being sampled. For example. in the regression of 
electricity consumption on income (X,) and house size (X,) there is a physical constraint in the population in 
that families with higher incomes generally have larger homes than families with lower incomes. 

3. Model specification. For example, adding polynomial terms to a regression model. especially when the 
range of the X variable is small. 

4. An overdetermined model. This happens when the model has more explanatory variables than the 
number of observations. This could happen in medical research where there may be a small number of 
patients about whom information is collected on a large number of variables. 

An additional reason for multicollinearity, especially in time series data, may be that the regressors included 
in the model share a common trend, that is, they all increase or decrease over time. Thus, in the regression of 
consumption expenditure on income, wealth, and population, the regressors income. wealth, and population 
may all be growing over time at more or less the same rate, leading to collinearity among these variables. 


10.2 Estimation in the Presence of Perfect Multicollinearity 


It was stated previously that in the case of perfect multicollinearity the regression coefficients remain 
indeterminate and their standard errors are infinite. This fact can be demonstrated readily in terms of the 


’Douglas Montgomery and Elizabeth Peck, Introduction to Linear Regression Analysis, John Wiley & Sons, New York, 1982, 
pp. 289-290. See also R. L. Mason, R. F. Gunst, and J. T. Webster, “Regression Analysis and Problems of Multicollinearity,” 
Communications in Statistics A, vol. 4, no. 3, 1975, pp. 277-292; R. F. Gunst, and R. L. Mason, “Advantages of Examining 
Multicollinearities in Regression Analysis,” Biometrics, vol. 33, 1977, pp. 249-260. 
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three-variable regression model. Using the deviation form, where all the variables are expressed as deviations 
from their sample means, we can write the three-variable regression model as 

Mi Box2i a B3x3i + it; (10.2.1) 


Now from Chapter 7 we obtain 


5 ase NO Pit) (X51) O) (Do xara) 


° 4 2 Te 
Á (a x3,) — (© xuxa) (7.4.7) 
ĝi = Oo yeas x3) J (>> yixa (È xixa: ) _ 


(Eaz) (È x3;) — (È xzix3:) 
Assume that X;, = AX. where A is a nonzero constant (e.g., 2, 4, 1.8, etc.). Substituting this into Eq. (7.4.7), 
we obtain 


bo = (X vixz) (A? x3) E (à DL vix2) (A 23) 
(03579) Cub De 7) eer ca ODE 7) : (10.2.2) 


ay 


0 


which is an indeterminate expression. The reader can verify that Bs is also indeterminate.® 

Why do we obtain the result shown in Eq. (10.2.2)? Recall the meaning of Boalt gives the rate of change 
in the average value of Y as X, changes by a unit, holding X, constant. But if X, and X, are perfectly collinear, 
there is no way X, can be kept constant: As X, changes, so does X, by the factor A. What it means, then, is 
that there is no way of disentangling the separate influences of X, and X, from the given sample: For practical 
purposes X, and X, are indistinguishable. In applied econometrics this problem is most damaging since the 
entire intent is to separate the partial effects of each X upon the dependent variable. 

To see this differently, let us substitute X;, = AX), into Eq. (10.2.1) and obtain the following [see also Eq. 
OTZ 


yi = Box; + B3(Axx) + û; 


= (Bo +Aĝs)xz + ti (10.2.3) 
= 6X2; + û; 
where 
& = (B + aĝ) (10.2.4) 
Applying the usual OLS formula to Eq. (10.2.3), we get 
â = (Ê; + ABs) = 2o xai (10.2.5) 


i 


Therefore, although we can estimate œ uniquely, there is no way to estimate 6, and 8, uniquely; 
mathematically 


8another way of seeing this is as follows: By definition, the coefficient of correlation between X, and X3, 13, is 
E xixi // x2 0 x2. tf r2, = 1, i.e., perfect collinearity between X, and X;, the denominator of Eq. (7.4.7) will be zero, 
making estimation of 8, (or of B3) impossible. 
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â= +À Bs i (10.2.6) 
gives us only one equation in two unknowns (note À is given) and there is an infinity of solutions to 
Eq. (10.2.6) for given values of & and A. To put this idea in concrete terms, let å = 0.8 and A = 2. Then we 
have 


0.8 = B + 2h; (10.2.7) 
or 


ĝ = 0.8 — 2B3 (10.2.8) 


Now choose a value of Bs arbitrarily, and we will have a solution for bo. Choose another value for bs, and we 
will have another solution for Bo. No matter how hard we try, there is no unique value for Bo. 

The upshot of the preceding discussion is that in the case of perfect multicollinearity one cannot get a 
unique solution for the individual regression coefficients. But notice that one can get a unique solution for 
linear combinations of these coefficients. The linear combination (B, + A83) is uniquely estimated by a, given 
the value of A.” y g 

In passing, note that in the case of perfect multicollinearity the variances and standard errors of Bz and B3 
individually are infinite. (See Exercise 10.21.) 


10.3 Estimation in the Presence of “High” 
but “Imperfect” Multicollinearity 


The perfect multicollinearity situation is a pathological extreme. Generally, there is no exact linear relationship 
among the X variables, especially in data involving economic time series. Thus, turning to the three-variable 
model in the deviation form given in Eq. (10.2.1), instead of exact multicollinearity, we may have 


x3; = AX; +v; (10.3.1) 
where A # 0 and where v; is a stochastic error term such that }- x2;v; = 0. (Why?) 
Incidentally, the Ballentines shown in Figure 10.1b to 10.1e represent cases of imperfect collinearity. 


In this case, estimation of regression coefficients 8, and B, may be possible. For example, substituting 
Eq. (10.3.1) into Eq. (7.4.7), we obtain 


A Divx)? 0033; + VF) — AL yinar +O viv) (A x3;) 
SS a aS TE) a 
Eo (A?) <5; taa A ara) 
where use is made of }° x2;v; = 0. A similar expression can be derived for B3. 
Now, unlike Eq. (10.2.2), there is no reason to believe a priori that Eq. (10.3.2) cannot be estimated. Of 


course, if v; is sufficiently small, say, very close to zero, Eq. (10.3.1) will indicate almost perfect collinearity 
and we shall be back to the indeterminate case of Eq. (10.2.2). 


(10.3.2) 


10.4 Multicollinearity: Much Ado about Nothing? 
Theoretical Consequences of Multicollinearity 


Recall that if the assumptions of the classical model are satisfied, the OLS estimators of the regression 
estimators are BLUE (or BUE, if the normality assumption is added). Now it can be shown that even if multi- 


In econometric literature, a function such as (B2 + AB3) is known as an estimable function. 
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collinearity is very high, as in the case of near multicollinearity, the OLS estimators still retain the property 
of BLUE.'° Then what is the multicollinearity fuss all about? As Christopher Achen remarks (note also the 
Leamer quote at the beginning of this chapter): 


Beginning students of methodology occasionally worry that their independent variables are correlated—the 
so-called multicollinearity problem. But multicollinearity violates no regression assumptions. Unbiased, consistent 
estimates will occur, and their standard errors will be correctly estimated. The only effect of multicollinearity is to 
make it hard to get coefficient estimates with small standard error. But having a small number of observations also 
has that effect, as does having independent variables with small variances. (In fact, at a theoretical level, multicol- 
linearity, few observations and small variances on the independent variables are essentially all the same problem.) 
Thus “What should I do about multicollinearity?” is a question like “What should I do if I don’t have many obser- 
vations?” No statistical answer can be given.!! 


To drive home the importance of sample size, Goldberger coined the term micronumerosity, to counter 
the exotic polysyllabic name multicollinearity. According to Goldberger, exact micronumerosity (the 
counterpart of exact multicollinearity) arises when n, the sample size, is zero, in which case any kind of 
estimation is impossible. Near micronumerosity, like near multicollinearity, arises when the number of obser- 
vations barely exceeds the number of parameters to be estimated. 

Leamer, Achen, and Goldberger are right in bemoaning the lack of attention given to the sample size 
problem and the undue attention to the multicollinearity problem. Unfortunately, in applied work involving 
secondary data (i.e., data collected by some agency, such as the GNP data collected by the government), an 
individual researcher may not be able to do much about the size of the sample data and may have to face 
“estimating problems important enough to warrant our treating it [i.e., multicollinearity] as a violation of the 
CLR [classical linear regression] model.”!” 

First, it is true that even in the case of near multicollinearity the OLS estimators are unbiased. But 
unbiasedness is a multisample or repeated sampling property. What it means is that, keeping the values of 
the X variables fixed, if one obtains repeated samples and computes the OLS estimators for each of these 
samples, the average of the sample values will converge to the true population values of the estimators as the 
number of samples increases. But this says nothing about the properties of estimators in any given sample. 

Second, it is also true that collinearity does not destroy the property of minimum variance: In the class of 
all linear unbiased estimators, the OLS estimators have minimum variance; that is, they are efficient. But this 
does not mean that the variance of an OLS estimator will necessarily be small (in relation to the value of the 
estimator) in any given sample, as we shall demonstrate shortly. 

Third, multicollinearity is essentially a sample (regression) phenomenon in the sense that, even if the 
X variables are not linearly related in the population, they may be so related in the particular sample at 
hand: When we postulate the theoretical or population regression function (PRF), we believe that all the X 
variables included in the model have a separate or independent influence on the dependent variable Y. But it 
may happen that in any given sample that is used to test the PRF some or all of the X variables are so highly 
collinear that we cannot isolate their individual influence on Y. So to speak, our sample lets us down, although 
the theory says that all the X’s are important. In short, our sample may not be “rich” enough to accommodate 
all X variables in the analysis. 


since near multicollinearity per se does not violate the other assumptions listed in Chapter 7, the OLS estimators are BLUE 
as indicated there. 

"Christopher H. Achen, Interpreting and Using Regression, Sage Publications, Beverly Hills, Calif., 1982, pp. 82-83. 

12Peter Kennedy, A Guide to Econometrics, 3d ed., The MIT Press, Cambridge, Mass., 1992, p. 177. 
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As an illustration, reconsider the consumption—income example of Chapter 3 (Example 3.1). Economists 
theorize that, besides income, the wealth of the consumer is also an important determinant of consumption 
expenditure. Thus, we may write 


Consumption; = 61 + £2 Income; + B3 Wealth; + u; 


Now it may happen that when we obtain data on income and wealth, the two variables may be highly, if 
not perfectly, correlated: Wealthier people generally tend to have higher incomes. Thus, although in theory 
income and wealth are logical candidates to explain the behavior of consumption expenditure, in practice (1.e., 
in the sample) it may be difficult to disentangle the separate influences of income and wealth on consumption 
expenditure. 

Ideally, to assess the individual] effects of wealth and income on consumption expenditure we need a suffi- 
cient number of sample observations of wealthy individuals with low income, and high-income individuals 
with low wealth (recall Assumption 7). Although this may be possible in cross-sectional studies (by increasing 
the sample size), it is very difficult to achieve in aggregate time series work. 

For all these reasons, the fact that the OLS estimators are BLUE despite multicollinearity is of little conso- 
lation in practice. We must see what happens or is likely to happen in any given sample, a topic discussed in 
the following section. 


10.5 Practical Consequences of Multicollinearity 


In cases of near or high multicollinearity, one is likely to encounter the following consequences: 


1. Although BLUE, the OLS estimators have large variances and covariances, making precise estimation 
difficult. 

2. Because of consequence 1, the confidence intervals tend to be much wider, leading to the acceptance of 
the “zero null hypothesis” (i.e., the true population coefficient is zero) more readily. 

3. Also because of consequence 1, the t ratio of one or more coefficients tends to be statistically insig- 
nificant. 

4. Although the ż ratio of one or more coefficients is statistically insignificant, R*, the overall measure of 
goodness of fit, can be very high. 

5. The OLS estimators and their standard errors can be sensitive to small changes in the data. 


The preceding consequences can be demonstrated as follows. 


Large Variances and Covariances of OLS Estimators 


To see large variances and covariances, recall that for the model (10.2.1) the variances and covariances of 
Bz and ß; are given by 


A o 
var (2) = wee) (7.4.12) 
A o2 
var (83) = Eee) (7.4.15) 
—r230? 


cov (fz, 3) = 


(S A x2 : (7.4.17) 


where r3; is the coefficient of correlation between X, and X}. 
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It is apparent from Eqs. (7.4.12) and (7.4.15) that as r3, tends toward 1, that is, as collinearity increases, 
the variances of the two estimators increase and in the limit when r,, = 1, they are infinite. It is equally 
clear from Eq. (7.4.17) that as r>3 Increases toward 1, the covariance of the two estimators also increases in 
absolute value. [Note: cov (pp, aK = cov (ps, o).] 


The speed with which variances and covariances increase can be seen with the variance-inflating factor 
(VIF), which is defined as 


1 
VIF a = A (10.5.1) 
VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As ee approaches 
|, the VIF approaches infinity. That is, as the extent of collinearity increases, the variance of an estimator 
increases, and in the limit it can become infinite. As can be readily seen, if there is no collinearity between 
X, and X;, VIF will be 1. 
Using this definition, we can express Eqs. (7.4.12) and (7.4.15) as 


var (B2) = =—; VIF (10.5.2) 


var (63) = aL (10.5.3) 
X31 
which show that the variances of > and A; are directly proportional to the VIF. 

To give some idea about how fast the variances and covariances increase as r, 3 increases, consider Table 
10.1, which gives these variances and covariances for selected values of r33. As this table shows, increases 
in rə 3 have a dramatic effect on the estimated variances and covariances of the OLS estimators. When 
ry 3 = 0.50, the var (Bo) is 1.33 times the variance when rj, is zero, but by the time r, reaches 0.95 it is about 
10 times as high as when there is no collinearity. And lo and behold, an increase of r,, from 0.95 to 0.995 
makes the estimated variance 100 times that when collinearity is zero. The same dramatic effect is seen on 
the estimated covariance. All this can be seen in Figure 10.2. 

The results just discussed can be easily extended to the k-variable model. In such a model, the variance of 
the kth coefficient, as noted in Eq. (7.5.6), can be expressed as: 


3 o? il 
var (B;) = Tx? I R2 (7.5.6) 
J} J 


where Ê; = (estimated) partial regression coefficient of regressor X; 


R° = R? in the regression of X; on the remaining (k — 2) regressions (Note: There are [k — 1] regressors 
in the k-variable regression model.) 


Sy = D(A - x,y 
We can also write Eq. (7.5.6) as 
a o? 

ar(B;)= = VE; 10.5.4 

Vi (8) Deas J ( ) 

As se can see from this expression, var (B;) is proportional to o° and VIF but inversely pion to 
wae Thus, whether var ( 6) is large or small will depend on the three ingredients: (1) o°, (2) VIF, and 
(3) Sa. The last one, which ties in with Assumption 8 of the classical model, states that the larger the 
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Table 10.1 The Effect of Increasing r}, on var ( bo) and cov (Bo A ps) 


var (Â2) (r23 # 0) 


Value of r23 VIF var (B2) var (2)(r23 = 0) cov (Êz, B3) 
(1) (2) G3) (4) (3) 
Oo 
0.00 1.00 ———— = 5 0 
Exi 
0.50 ee) 1.33 xA 1.33 0.67 x B 
0.70 1.96 1.96xA 1.96 1.37 x B 
0.80 2.78 2.78 x A 2.78 2.22 x B 
0.90 5.76 5.26 x A 5.26 4.73 x B 
0.95 10.26 10.26 x A 10.26 9.74 x B 
0.97 16.92 16.92 x A 16.92 16.41 x B 
0.99 50.25 50.25 x A 50.25 49.75 x B 
0.995 100.00 100.00 x A 100.00 99.50 x B 
0.999 500.00 500.00 x A 500.00 499.50 x B 
o? 
Note: A = we 
“gd 
b= -m 
VÈri 3; 
x = times 


*To find out the effect of increasing r23 on var (Bs), note that A = o? / yee x3; when 723 = 0, but the variance and 
covariance magnifying factors remain the same. 


var ( Bo) 


0 0.5 0.8 0.9 1.0 


Figure 10.2 The behavior of var (2) as a function of 155. 

variability in a regressor, the smaller the variance of the coefficient of that regressor, assuming the other two 

ingredients are constant, and therefore the greater the precision with which that coefficient can be estimated. 
Before proceeding further, it may be noted that the inverse of the VIF is called tolerance (TOL). That i is, 


1 
TOL, = VIF = (1~— R3) (10.5.5) 
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When Ri = | (i.e., perfect collinearity), TOL; = 0 and when R = 0 (i.e., no collinearity whatsoever), 
TOL, is 1. Because of the intimate connection between VIF and TOL, one can use them interchangeably. 


Wider Confidence Intervals 


Because of the large standard errors, the confidence intervals for the relevant population parameters tend to be 
larger, as can be seen from Table 10.2. For example, when r3, = 0.95, the confidence interval for B, is larger 
than when r, ; = 0 by a factor of v 10.26, or about 3. 

Therefore, in cases of high multicollinearity, the sample data may be compatible with a diverse set of 
hypotheses. Hence, the probability of accepting a false hypothesis (i.e., type II error) increases. 


Table 10.2 The Effect of Increasing Collinearity on the 95% Confidence Interval for B,: bo + 1.96 se (Bo) 


Value of r23 95% Confidence Interval for B2 
z2 


© x5; 


o2 
0.50 PaE 196/033) 7 
an 2i 
0.95 B2 + 1.96./(10.26) w 
i 2i 
0.995 bo H .96,/(100) pe 
Ax 2i 
o2 
0.999 B2 + 1.96./(500). | —— 
+ x3 2i 


Note: We are using the normal distribution because o? is assumed for convenience to be 
known. Hence the use of 1.96, the 95% confidence factor for the normal distribution. 

The standard errors corresponding to the various 723 values are obtained from 
Table 10.1. 


0.00 ĝ2 +1.96 


“Insignificant” t Ratios 


Recall that to test the null hypothesis that, say, B, = 0, we use the ż ratio, that is, Bo /se (b2), and compare the 
estimated ¢ value with the critical ż value from the f table. But as we have seen, in cases of high collinearity 
the estimated standard errors increase dramatically, thereby making the t values smaller. Therefore, in such 
cases, one will increasingly accept the null hypothesis that the relevant true population value is zero.'? 


A High R? but Few Significant t Ratios 


Consider the k-variable linear regression model: 


Y; = By + BoX2; + B3X3i + +++ + BeXei + ui 


13in terms of the confidence intervals, 8, = 0 value will lie increasingly in the acceptance region as the degree of collinearity 
increases. 
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In cases of high collinearity, it is possible to find, as we have just noted, that one or more of the partial 
slope coefficients are individually statistically insignificant on the basis of the ¢ test. Yet the R? in such situa- 
tions may be so high, say, in excess of 0.9, that on the basis of the F test one can convincingly reject the 
hypothesis that B, = B,=-- - = B, = 0. Indeed, this is one of the signals of multicollinearity—insignificant t 
values but a high overall R? (and a significant F value)! 

We shall demonstrate this signal in the next section, but this outcome should not be surprising in view of 
our discussion on individual versus joint testing in Chapter 8. As you may recall, the real problem here is 
the covariances between the estimators, which, as formula (7.4.17) indicates, are related to the correlations 
between the regressors. 


Sensitivity of OLS Estimators and Their Standard Errors to Small 
Changes in Data 


As long as multicollinearity is not perfect, estimation of the regression coefficients is possible but the 
estimates and their standard errors become very sensitive to even the slightest change in the data. 
To see this, consider Table 10.3. Based on these data, we obtain the following multiple regression: 


Ê, = 1.1939 + 0.4463Xz;+ 0.0030X3; 
(0.7737) (0.1848) (0.0851) 
t = (1.5431) (2.4151) (0.0358) (10.5.6) 
R? = 0.8101 r23 = 0.5523 
cov (b>, 3) = —0.00868 df—2 


Regression (10.5.6) shows that none of the regression coefficients is individually significant at the conven- 
tional 1 or 5 percent levels of significance, although £2 is significant at the 10 percent level on the basis of a 
one-tail ż test. 

Now consider Table 10.4. The only difference between Tables 10.3 and 10.4 is that the third and fourth 
values of X, are interchanged. Using the data of Table 10.4, we now obtain 


Ê, = 1.2108 + 0.4014Xz; + 0.0270X3; 
(0.7480) (0.2721) (0.1252) 
t = (1.6187) (1.4752) (0.2158) l (10.5.7) 
R? = 0.8143 r23 = 0.8285 
cov (Bo, £3) = —0.0282 df=2 


Table 10.3 Hypothetical Data on YN, and X; Table 10.4 Hypothetical Data on Y, N>, and A 
Y X2 X3 Y ` X- l X3 
1 2 4 1 2 £ 
2 0 2 2 0 2 
3 4 12 3 4 0 
4 6 0 4 6 12 
5 8 16 5 8 16 
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As a result of a slight change in the data, we see that 62, which was statistically significant before at 
the 10 percent level of significance, is no longer significant even at that level. Also note that in Eq. (10.5.6) 
cov (2, 63) = —0.00868 whereas in Eq. (10.5.7) it is —0.0282, a more than threefold increase. All these 
changes may be attributable to increased multicollinearity: In Eq. (10.5.6) ra = 0.5523, whereas in 
Eq. (10.5.7) it is 0.8285. Similarly, the standard errors of > and Ê; increase between the two regressions, a 
usual symptom of collinearity. 

We noted earlier that in the presence of high collinearity one cannot estimate the individual regression 
coefficients precisely but that linear combinations of these coefficients may be estimated more precisely. This 
fact can be substantiated from the regressions (10.5.6) and (10.5.7). In the first regression the sum of the two 
partial slope coefficients is 0.4493 and in the second it is 0.4284, practically the same. Not only that, their 
standard errors are practically the same, 0.1550 vs. 0.1823.'* Note, however, the coefficient of X, has changed 
dramatically, from 0.003 to 0.027. 


Consequences of Micronumerosity 


In a parody of the consequences of multicollinearity, and in a tongue-in-cheek manner, Goldberger cites 
exactly similar consequences of micronumerosity, that is, analysis based on small sample size.!> The reader 
is advised to read Goldberger’s analysis to see why he regards micronumerosity as being as important as 
multicollinearity. 


10.6 An Illustrative Example 


Example 10.1 Consumption Expenditure in Relation to Income and Wealth 


To illustrate the various points made thus far, let us consider the consumption—income example from the 
introduction. Table 10.5 contains hypothetical data on consumption, income, and wealth. If we assume 


Table 10.5 Hypothetical Data on Consumption Expenditure Y, Income X,, and Wealth X, 


Y$ e Gas X, $ 


70 80 810 
65 100 1009 
90 120 1273 
95 140 1425 
110 160 1633 
(ele 180 . 1876 
120 200 2052 
140 220 2201 
155 240 . 2435 
150 260 2686 


14These standard errors are obtained from the formula 


se (Bz + B3) = y var (B2) + var (Ê3) + 2 cov (ĝ2, ĝ3) 


Note that increasing collinearity increases the variances of 8, and £;, but these variances may be offset if there is high nega- 
tive covariance between the two, as our results clearly point out. 


'5Goldberger, op. cit., pp. 248-250. 
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that consumption expenditure is linearly related to income and wealth, then, from Table 10.5 we obtain the 
following regression: 
Ŷ; = 24.7747 + 0.9415X2;—  0.0424X3; 
(6.7525) (0.8229) (0.0807) 
t= (3.6690) (1.1442) (—0.5261) 
R2= 0.9635 R? = 0.9531 df= (10.6.1) 


Regression (10.6.1) shows that income and wealth together explain about 96 percent of the variation 
in consumption expenditure, and yet neither of the slope coefficients is individually statistically significant. 
Moreover, not only is the wealth variable statistically insignificant but also it has the wrong sign. A priori, one 
would expect a positive relationship between consumption and wealth. Although B2 and ĝ are individually 
statistically insignificant, if we test the hypothesis that 83 = B3 = 0 simultaneously, this hypothesis can be 
rejected, as Table 10.6 shows. Under the usual assumption we obtain 


_ 4282.7770 
~ 46.3494 


This F value is obviously highly significant. 


= 92.4019 (10.6.2) 


Table 10.6 ANOVA Table for the Consumption—Income—Wealth Example 


Source of Variation SS df MSS 


Due to regression 8,565.5541 2 4,282.7770 


Due to residual 324.4459 7 46.3494 


It is interesting to look at this result geometrically. (See Figure 10.3.) Based on the regression (10.6.1), we 
have established the individual 95 percent confidence intervals for B, and £, following the usual procedure 
discussed in Chapter 8. As these intervals show, individually each of them includes the value of zero. Therefore, 
individually we can accept the hypothesis that the two partial slopes are zero. But, when we establish the joint 
confidence interval to test the hypothesis that B, = 8; = 0, that hypothesis cannot be accepted since the joint 
confidence interval, actually an ellipse, does not include the origin.'® As already pointed out, when collinearity 
is high, tests on individual regressors are not reliable; in such cases it is the overall F test that will show if Y is 
related to the various regressors. bd 

Our example shows dramatically what multicollinearity does. The fact that the F test is significant but the 
t values of X, and X, are individually insignificant means that the two variables are so highly correlated that it 
is impossible to isolate the individual impact of either income or wealth on consumption. As a matter of fact, 
if we regress X, on Xz, we obtain 


R3; = 7.5454 + 10.1909X2; 
(29.4758) (0.1643) (10.6.3) 
t= (0.2560) (62.0405) R? = 0.9979 
which shows that there is almost perfect collinearity between X, and X,. 
Now let us see what happens if we regress Y on X, only: 
¥; = 24.4545 + 0.5091X2; 
(6.4138) (0.0357) (10.6.4) 
t= (3.8128) (14.2432) R? = 0.9621 


1éAs noted in Section 5.3, the topic of joint confidence interval is rather involved. The interested reader may consult the 
reference cited there. 
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Joint 95% confidence 
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Figure 10.3 Individual confidence intervals for B, and B; and joint confidence interval (ellipse) for B, and B;. 


In Eq. (1 0.6.1) the income variable was statistically insignificant, whereas now it is highly significant. If 
instead of regressing Y on X}, we regress it on X3, we obtain 


¥; = 24.411 + 0.0498X3; 
(6.874) (0.0037) (10.6.5) 
t= (3.551) (13.29) R2 = 0.9567 


We see that wealth has now a significant impact on consumption expenditure, whereas in Eq. (10.6.1) it had 
no effect on consumption expenditure. 

Regressions (10.6.4) and (10.6.5) show very clearly that in situations of extreme multicollinearity dropping 
the highly collinear variable will often make the other X variable statistically significant. This result would 
suggest that a way out of extreme collinearity is to drop the collinear variable, but we shall have more to say 
about it in Section 10.8. 


Example 10.2 Consumption Function for United States, 1947-2000 


We now consider a concrete set of data on real consumption expenditure (C), real disposable personal income 
(Yd), real wealth (W), and real interest rate (I) for the United States for the period 1947-2000. The raw data 
are given in Table 10.7. 

We use the following for analysis 


In Cr = By + B2 In Yd; + B3 In We + Bale + Ut (10.6.6) 
where In stands for logarithm. 


In this model the coefficients B, and B, give income and wealth elasticities, respectively (why?) and B, gives 
semielasticity (why?). The results of regression (10.6.6) are given in the following table. 
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Table 10.7 U.S. Consumption Expenditure for the Period 1947-2000 


Year l C Yd wi. l 


1947 976.4 1035.2 5166.815 —10.35094 
1948 998.1 1090 5280.757 —4.719804 
1949 1025.3 1095.6 5607.351 1.044063 
1950 1090.9 1192.7 5759515 0.407346 
1351 1107.1 1227 6086.056 5.263152 
1952 1142.4 ` 1266.8 6243.864 —0.277011 
1953 1972 13275 6355.613 0.561137 
1954 1220.9 1344 6797.027 —0.138476 
1955 1310.4 1433.8 7172.242 0.261997 
1956 1348.8 1502.3 7375.18 —0.736124 
1957 1381.8 1539.5 7315.286 — 0.260683 
1958 1393 1553.7 7869.975 —0.57463 
1959 1470.7 1623.8 8188.054 2.295943 
1960 1510.8 1664.8 8351.757 1.501 16a 
1961 1541.2 1720 8971.872 1.296432 
1962 1617.3 1803.5 9091.545 1.395922 
1963 1684 1871.5 9436.097 2.057616 
1964 1784.8 2006.9 10003.4 2.026599 
1965 1897.6 2131 10562.81 2.111669 
1966 2006.1 2244.6 10522.04 : ~ 2.020251 
1967 2066.2 2340.5 11312.07 1.212616 
1968 2184.2 2448.2 12145.41 1.054986 
1969 2264.8 2524.3 11672.25 1.732154 
1970 23175 2630 11650.04 1.166228 
1971 2405.2 2745.3 12312.92 —0.712241 
1972 2550.5 2874.3 13499.92 =0.155737 
1973 2675.9 3072.3 - 13080.96 a 1.413839 
1974 2653.7 3051.9 11868.79 —1.042571 ~ 
1975 2710.9 3108.5 12634.36 =3:533585 
1976 2868.9 3243.5 13456.78 —0.656766 
1977 2992.1 3360.7 13786.31 —1.190427 
1978 3124.7 l 352745 14450.5 0.113048 
1979 3203.2 3628.6 15340 1.70421 
1980 3193 3658 15964.95 2.298496 
1981 3236 3741.1 15964.99 4.703847 
1982 327575 3791.7 16312.51 4.449027 
1983 3454.3 3906.9 ‘ 16944.85 ` 4.690972 
1984 3640.6 4207.6 17526.75 5.848332 
1985 3820.9 4347.8 19068.35 4.330504 
1986 3981.2 4486.6 20530.04 3.768031 
1987 4113.4 4582.5 21235.69 2.819469 
1988 4279.5 4784.1 22331.99 3.287061 


-e eee 


(Contd.) 
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(Contd.) 
1989 4393.7 4906.5 23659.8 4.317956 
1990 4474.5 5014.2 23105.13 3.595025 
1991 4466.6 - 5033 24050.21 1.802757 
1992 4594.5 5189.3 24418.2 1.007439 
1993 4748.9 > 5261.3 25092.33 0.62479 
1994 4928.1 ' 5397.2 25218.6 2.206002 
1995 5075.6 5539.1 27439.73 3.333143 
1996 5237.5 5677.7 29448.19 3.083201 
1997 5423.9 5854.5 32664.07 312 
1998 5683.7 6168.6 35587.02 3.583909 
1999 5968.4 6320 39591.26 3.245271 
2000 6257.8 6539.2 38167.72 3.57597 
Source: See Table 7.12. 
Dependent Variable: LOG (C) 
Method: Least Squares 
Sample: 1947-2000 
Included observations: 54 
Coefficient Std). aberor t-Statistic Prob. 
G -0.467711 0.042778 A093343 0.0000 
LOG (YD) 0.804873 0.017498 45.99836 0.0000 
LOG (WEALTH) 0a A0 T270 0.017593 11.44060 0.0000 
INTEREST -0.002689 0.000762 -3.529265 0.0009 
R-squared 0.999560 Mean dependent var. 77826093 
Adjusted R-squared 0.999533 S.D. dependent var. 0.552368 
S.E. of regression 0.011934 Akaike info criterion 5.947703 
Sum squared resid. O.007L 2 Schwarz criterion -5.800371 
Log likelihood 164.5880 Hannan-Quinn cariter. -5.890883 
F-statistic 2703259 Durbin-Watson stat. 1.289219 


Prob(F-statistic) 0.000000 


Note: LOG stands for natural log. 


The results show that all the estimated coefficients are highly statistically significant, for their p values 
are extremely small. The estimated coefficients are interpreted as follows. The income elasticity is ~ 0.80, 
suggesting that, holding other variables constant, if income goes up by 1 percent, the mean consumption 
expenditure goes up by about 0.8 percent. The wealth coefficient is ~ 0.20, meaning that if wealth goes up by 
1 percent, mean consumption goes up by only 0.2 percent, again holding other variables constant. The coeffi- 
cient of the interest rate variable tells us that as the interest rate goes up by one percentage point, consumption 
expenditure goes down by 0.26 percent, ceteris paribus. 

All the regressors have signs that accord with prior expectations, that is, income and wealth both have a 
positive impact on consumption but interest rate has a negative impact. 
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Do we have to worry about the problem of multicollinearity in the present case? Apparently not, because 
all the coefficients have the right signs, each coefficient is individually statistically significant, and the F value 
is also statistically highly significant, suggesting that, collectively, all the variables have a significant impact on 
consumption expenditure. The R? value is also quite high. 

Of course, there is usually some degree of collinearity among economic variables. As long as it is not exact, 
we can still estimate the parameters of the model. For now, all we can say is that, in the present example, 
collinearity, if any, does not seem to be very severe. But in Section 10.7 we provide some diagnostic tests to 
detect collinearity and reexamine the U.S. consumption function to determine whether it is plagued by the 
collinearity problem. 


10.7 Detection of Multicollinearity 


Having studied the nature and consequences of multicollinearity, the natural question is: How does one know 
that collinearity is present in any given situation, especially in models involving more than two explanatory 
variables? Here it is useful to bear in mind Kmenta’s warning: 


1. Multicollinearity is a question of degree and not of kind. The meaningful distinction is not between the 
presence and the absence of multicollinearity, but between its various degrees. 

2. Since multicollinearity refers to the condition of the explanatory variables that are assumed to be nonsto- 
chastic, it is a feature of the sample and not of the population. 
Therefore, we do not “test for multicollinearity” but can, if we wish, measure its degree in any particular sample.!’ 


Since multicollinearity is essentially a sample phenomenon, arising out of the largely nonexperimental 
data collected in most social sciences, we do not have one unique method of detecting it or measuring its 
strength. What we have are some rules of thumb, some informal and some formal, but rules of thumb all the 
same. We now consider some of these rules. 

1. High R? but few significant t ratios. As noted, this is the “classic” symptom of multicollinearity. 
If R is high, say, in excess of 0.8, the F test in most cases will reject the hypothesis that the partial slope 
coefficients are simultaneously equal to zero, but the individual f tests will show that none or very few of 
the partial slope coefficients are statistically different from zero. This fact was clearly demonstrated by our 
consumption—income—wealth example. 

Although this diagnostic is sensible, its disadvantage is that “it is too strong in the sense that multicol- 
linearity is considered as harmful only when all of the influences of the explanatory variables on Y cannot be 
disentangled.”!® 

2. High pair-wise correlations among regressors. Another suggested rule of thumb is that if the pair-wise 
or zero-order correlation coefficient between two regressors is high, say, in excess of 0.8, then multicol- 
linearity is a serious problem. The problem with this criterion is that, although high zero-order correlations 
may suggest collinearity, it is not necessary that they be high to have collinearity in any specific case. To put 
the matter somewhat technically, high zero-order correlations are a sufficient but not a necessary condition 
for the existence of multicollinearity because it can exist even though the zero-order or simple correlations 
are comparatively low (say, less than 0.50). To see this relationship, suppose we have a four-variable model: 


Y; = By + BoX2; + B3X3; + b4X4i + ui 
and suppose that 
Xai = 2X7; + 3X3; 


1an Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 431. 
'Sibid., p. 439. 
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where A, and A, are constants, not both zero. Obviously, X, is an exact linear combination of X, and X}, giving 
R32; = 1, the coefficient of determination in the regression of X, on X, and X}. 
Now recalling the formula (7.11.5) from Chapter 7, we can write 


2 2 =) 
PE (eT geran (10.7.1) 
23 


But since Rj, = 1 because of perfect collinearity, we obtain 


p= Maa 1043 — 2raarasres (10.7.2) 
1 — ae 
It is not difficult to see that Eq. (10.7.2) is satisfied by r4, = 0.5, r43 = 0.5, and r, =—0.5, which are not very 
high values. 

Therefore, in models involving more than two explanatory variables, the simple or zero-order correlation 
will not provide an infallible guide to the presence of multicollinearity. Of course, if there are only two 
explanatory variables, the zero-order correlations will suffice. 

3. Examination of partial correlations. Because of the problem just mentioned in relying on zero-order 
correlations, Farrar and Glauber have suggested that one should look at the partial correlation coefficients.!° 
Thus, in the regression of Y on X,, X;, and X,, a finding that R? 3, is very high but FP at RE compar- 
atively low may suggest that the variables X,, X}, and X, are highly intercorrelated and that at least one of 
these variables is superfluous. 

Although a study of the partial correlations may be useful, there is no guarantee that they will provide 
an infallible guide to multicollinearity, for it may happen that both R? and all the partial correlations are 
sufficiently high. But more importantly, C. Robert Wichers has shown” that the Farrar—Glauber partial corre- 
lation test is ineffective in that a given partial correlation may be compatible with different multicollinearity 
patterns. The Farrar-Glauber test has also been severely criticized by T. Krishna Kumar”! and John O’ Hagan 
and Brendan McCabe.” 

4. Auxiliary regressions. Since multicollinearity arises because one or more of the regressors are exact 
or approximately linear combinations of the other regressors, one way of finding out which X variable is 
related to other X variables is to regress each X; on the remaining X variables and compute the corresponding 
R?, which we designate as Re: each one of these regressions is called an auxiliary regression, auxiliary 
to the main regression of Y on the X’s. Then, following the relationship between F and R? established in 
Eq. (8.4.11), the variable 


nae /(k ra 2) 
(1 i Roa) nt k ote 1) 


follows the F distribution with k — 2 and n — k + 1 df. In Eq. (10.7.3) n stands for the sample size, k stands for 
the number of explanatory variables including the intercept term, and Ra a 18 the coefficient of determi- 
nation in the regression of variable X, on the remaining X variables.” 


FA = (10.7.3) 


19D. E. Farrar and R. R. Glauber, “Multicollinearity in Regression Analysis: The Problem Revisited,” Review of Economics and 
Statistics, vol. 49, 1967, pp. 92-107. 

20“The Detection of Multicollinearity: A Comment,” Review of Economics and Statistics, vol. 57, 1975, pp. 365-366. 
21“NAulticollinearity in Regression Analysis,” Review of Economics and Statistics, vol. 57, 1975, pp. 366-368. 

22”Tests for the Severity of Multicollinearity in Regression Analysis: A Comment,” Review of Economics and Statistics, vol. 57, 
1975, pp. 368-370. 

23For example, R2, can be obtained by regressing Xz; as follows: X3; = a, + 3X3; + @4X4; + = + OX ķi + ûj- 
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If the computed F exceeds the critical F, at the chosen level of significance, it is taken to mean that the 
particular X; is collinear with other X’s; if it does not exceed the critical F, we say that it is not collinear with 
other X’s, in which case we may retain that variable in the model. If F, is statistically significant, we will still 
have to decide whether the particular X; should be dropped from the model. This question will be taken up 
in Section 10.8. 

But this method is not without its drawbacks, for 


. if the multicollinearity involves only a few variables so that the auxiliary regressions do not suffer from 
extensive multicollinearity, the estimated coefficients may reveal the nature of the linear dependence among the 
regressors. Unfortunately, if there are several complex linear associations, this curve fitting exercise may not prove 
to be of much value as it will be difficult to identify the separate interrelationships.”* 


Instead of formally testing all auxiliary R? values, one may adopt Klein’ rule of thumb, which suggests 
that multicollinearity may be a troublesome problem only if the R° obtained from an auxiliary regression is 
greater than the overall R’, that is, that obtained from the regression of Y on all the regressors.** Of course. 
like all other rules of thumb, this one should be used judiciously. 

5. Eigenvalues and condition index. From EViews and Stata, we can find the eigenvalues and the 
condition index, to diagnose multicollinearity.We will not discuss eigenvalues here, for that would take us 
into topics in matrix algebra that are beyond the scope of this book. From these eigenvalues, however, we can 
derive what is known as the condition number k defined as 


k Maximum eigenvalue 
Minimum eigenvalue 


and the condition index (CI) defined as 


ei poania pipari -o 
Minimum eigenvalue 
Then we have this rule of thumb: If k is between 100 and 1000 there is moderate to strong multicollinearity 
and if it exceeds 1000 there is severe multicollinearity. Alternatively, if the CI (= Vk) is between 10 and 30. 
there is moderate to strong multicollinearity and if it exceeds 30 there is severe multicollinearity. 

For the illustrative example in App. 7A.5, the smallest eigenvalue is 3.786 and the largest eigenvalue 
is 187.5269 giving k = 187.5269/3.786 or about 49.53. Therefore Cl = /49.53 = 7.0377. Both k and CI 
suggest that we do not have a serious collinearity problem. Incidentally, note that a low eigenvalue (in relation 
to the maximum eigenvalue) is generally an indication of near-linear dependencies in the data. 

Some authors believe that the condition index is the best available multicollinearity diagnostic. But this 
opinion is not shared widely. For us, then, the CI is just a rule of thumb, a bit more sophisticated perhaps. But 
for further details, the reader may consult the references.”° 

6. Tolerance and variance inflation factor. We have already introduced TOL and VIF. As R2. the coeffi- 
cient of determination in the regression of regressor X; on the remaining regressors in the model, increases 


toward unity, that is, as the collinearity of X; with the other regressors increases, VIF also increases and in the 
limit it can be infinite. 


4George G. Judge, R. Carter Hill, William E. Griffiths, Helmut Lutkepohl, and Tsoung-Chao Lee, Introduction to the Theory 
and Practice of Econometrics, John Wiley & Sons, New York, 1982, p. 621. 


251 awrence R. Klein, An Introduction to Econometrics, Prentice-Hall, Englewood Cliffs, NJ, 1962, p. 101. 


7°See especially D. A. Belsley, E. Kuh, and R. E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Col- 
linearity, John Wiley & Sons, New York, 1980, Chapter 3. However, this book is not for the beginner. 
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Some authors therefore use the VIF as an indicator of multicollinearity. The larger the value of VIF, the 
more “troublesome” or collinear the variable X, As a rule of thumb, if the ie of a variable exceeds 10, 
which will happen if Ra exceeds 0.90, that variable i is said be highly collinear.” 

Of course, one could use TOL; as a measure of multicollinearity in view of its intimate connection 
with VIF;. The closer TOL, is to zero, the greater the degree of collinearity of that variable with the other 
pene On the other ~~ the closer TOL, is to 1, the greater the evidence that X, is not collinear with the 
other regressors. 

VIF (or tolerance) as a measure of collinearity is not free of criticism. As Eq. (10.5. si shows, var (Éi) 
depends on three factors: o^, oa x7, and VIF). A high VIF can be counterbalanced by a low a” or ahigh pen? x* 
To put it differently, a high VIF 1 is neither necessary nor sufficient to get high variances and high omda 
errors. Therefore, high multicollinearity. as measured by a high VIF, may not necessarily cause high standard 
errors. In all this discussion. the terms high and low are used in a relative sense. 

7. Scatterplot. It is a good practice to use a scatterplot to see how the various variables in a regression 
model are related. Figure 10.4 presents the scatterplot for the U.S. consumption example discussed in the 
previous section (Example 10.2). This is a four-by-four box diagram because we have four variables in the 
model, a dependent variable (C) and three explanatory variables: real disposable personal income (Yd), real 
wealth (W), and real interest rate (1). 


Q 2000 4000 6000 =10_ = 0 2 


0 2000 4000 6000 0 20,000 40,000 


Figure 10.4 Scatterplot for Example 10.2 data. 


First consider the main diagonal, going from the upper left-hand corner to the lower right-hand corner. 
There are no scatterpoints in these boxes that lie on the main diagonal. If there were, they would have a 
correlation coefficient of 1, for the plots would be of a given variable against itself. The off-diagonal boxes 
show intercorrelations among the variables. Take, for instance, the wealth box (W). It shows that wealth and 
income are highly correlated (the correlation coefficient between the two 1s 0.97), but not perfectly so. If 
they were perfectly correlated (i.e., if they had a correlation coefficient of 1), we would not have been able 
to estimate the regression (10.6.6) because we would have an exact linear relationship between wealth and 
income. The scatterplot also shows that the interest rate is not highly correlated with the other three variables. 


27See David G. Kleinbaum, Lawrence L. Kupper, and Keith E. Muller, Applied Regression Analysis and Other Multivariate Meth- 
ods, 2d ed., PWS-Kent, Boston, Mass., 1988, p. 210. 
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Since the scatterplot function is now included in several statistical packages, this diagnostic should be 
considered along with the ones discussed earlier. But keep in mind that simple correlations between pairs of 
variables may not be a definitive indicator of collinearity, as pointed out earlier. 

To conclude our discussion of detecting multicollinearity, we stress that the various methods we have 
discussed are essentially in the nature of “fishing expeditions,” for we cannot tell which of these methods 
will work in any particular application. Alas, not much can be done about it, for multicollinearity is specific 
to a given sample over which the researcher may not have much control, especially if the data are nonexperi- 
mental in nature—the usual fate of researchers in the social sciences. 

Again as a parody of multicollinearity, Goldberger cites numerous ways of detecting micronumerosity, 
such as developing critical values of the sample size, n , such that micronumerosity is a problem only if the 
actual sample size, n, is smaller than n“. The point of Goldberger’s parody is to emphasize that small sample 
size and lack of variability in the explanatory variables may cause problems that are at least as serious as 
those due to multicollinearity. 


10.8 Remedial Measures 


What can be done if multicollinearity is serious? We have two choices: (1) do nothing or (2) follow some 
rules of thumb. 


Do Nothing 


The “do nothing” school of thought is expressed by Blanchard as follows:”* 

When students run their first ordinary least squares (OLS) regression, the first problem that they usually 
encounter is that of multicollinearity. Many of them conclude that there is something wrong with OLS; some 
resort to new and often creative techniques to get around the problem. But, we tell them, this is wrong. Multi- 
collinearity is God’s will, not a problem with OLS or statistical technique in general. 

What Blanchard is saying is that multicollinearity is essentially a data deficiency problem (micronumer- 
osity, again) and sometimes we have no choice over the data we have available for empirical analysis. 

Also, it is not that all the coefficients in a regression model are statistically insignificant. Moreover, even if 
we cannot estimate one or more regression coefficients with greater precision, a linear combination of them 
(i.e., estimable function) can be estimated relatively efficiently. As we saw in Eq. (10.2.3), We can estimate a 
uniquely, even if we cannot estimate its two components given there individually. Sometimes this is the best 
we can do with a given set of data.” 


Rule-of-Thumb Procedures 


One can try the following rules of thumb to address the problem of multicollinearity; their success will 
depend on the severity of the collinearity problem. 
1. A priori information. Suppose we consider the model 


Y; = By + BoX2; + B3X3; + u; 


280. J. Blanchard, Comment, journal of Business and Economic Statistics, vol. 5, 1967, pp. 449-451. The quote is reproduced 
from Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p. 190. 


For an interesting discussion on this, see J. Conlisk, “When Collinearity Is Desirable,” Western Economic Journal, vol. 9 
1971, pp. 393-407. i l 
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where Y = consumption, X, = income, and X, = wealth. As noted before, income and wealth variables 
tend to be highly collinear. But suppose a priori we believe that 6, = 0.10,; that is, the rate of change of 
consumption with respect to wealth is one-tenth the corresponding rate with respect to income. We can then 
run the following regression: 


Y; = By + BoX2; + 0.10 BoX3; + ü; 
= By + PX; + ui 


where X, = X,; + 0.1X;3, Once we obtain ĝ», we can estimate A; from the postulated relationship between B, 
and B3. 

How does one obtain a priori information? It could come from previous empirical work in which the 
collinearity problem happens to be less serious or from the relevant theory underlying the field of study. For 
example, in the Cobb—Douglas—type production function (7.9.1), if one expects constant returns to scale to 
prevail, then (8, + 83) = 1, in which case we could run the regression (8.6.14), regressing the output-labor 
ratio on the capital-labor ratio. If there is collinearity between labor and capital, as generally is the case in 
most sample data, such a transformation may reduce or eliminate the collinearity problem. But a warning is 
in order here regarding imposing such a priori restrictions, “... since in general we will want to test economic 
theory’s a priori predictions rather than simply impose them on data for which they may not be true”? 
However, we know from Section 8.6 how to test for the validity of such restrictions explicitly. 

2. Combining cross-sectional and time series data. A variant of the extraneous or a priori information 
technique is the combination of cross-sectional and time series data, known as pooling the data. Suppose we 
want to study the demand for automobiles in the United States and assume we have time series data on the 
number of cars sold, average price of the car, and consumer income. Suppose also that 


In Y, = fı + Bo ln P; + Bs int; +u; 


where Y = number of cars sold, P = average price, 7 = income, and f = time. Our objective is to estimate the 
price elasticity, 8,, and income elasticity, B3. 

In time series data the price and income variables generally tend to be highly collinear. Therefore, if we 
run the preceding regression, we shall be faced with the usual multicollinearity problem. A way out of this 
has been suggested by Tobin.*! He says that if we have cross-sectional data (for example, data generated by 
consumer panels, or budget studies conducted by various private and governmental agencies), we can obtain 
a fairly reliable estimate of the income elasticity B, because in such data, which are at a point in time, the 
prices do not vary much. Let the cross-sectionally estimated income elasticity be ps. Using this estimate, we 
may write the preceding time series regression as 


Y* = Bi + Bolin P; + uy 
where Y* = In Y — ĝ; In J, that is, Y” represents that value of Y after removing from it the effect of income. 
We can now obtain an estimate of the price elasticity 8, from the preceding regression. 

Although it is an appealing technique, pooling the time series and cross-sectional data in the manner 
just suggested may create problems of interpretation, because we are assuming implicitly that the cross- 
sectionally estimated income elasticity is the same thing as that which would be obtained from a pure 
time series analysis.** Nonetheless, the technique has been used in many applications and is worthy of 


30Mark B. Stewart and Kenneth F. Wallis, Introductory Econometrics, 2d ed., John Wiley & Sons, A Halstead Press Book, New 
York, 1981, p. 154. 

31). Tobin, “A Statistical Demand Function for Food in the U.S.A.,” Journal of the Royal Statistical Society, Ser. A, 1950, pp. 
113-141. 

32For a thorough discussion and application of the pooling technique, see Edwin Kuh, Capital Stock Growth: A Micro-Econo- 
metric Approach, North-Holland Publishing Company, Amsterdam, 1963, Chapters 5 and 6. 
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consideration in situations where the cross-sectional estimates do not vary substantially from one cross 
section to another. An example of this technique is provided in Exercise 10.26. 

3. Dropping a variable(s) and specification bias. When faced with severe multicollinearity, one of the 
“simplest” things to do is to drop one of the collinear variables. Thus, in our consumption—income—wealth 
illustration, when we drop the wealth variable, we obtain regression (10.6.4), which shows that, whereas in 
the original model the income variable was statistically insignificant, it is now “highly” significant. 

But in dropping a variable from the model we may be committing a specification bias or specification 
error. Specification bias arises from incorrect specification of the model used in the analysis. Thus, if economic 
theory says that income and wealth.should both be included in the model explaining the consumption expen- 
diture, dropping the wealth variable would constitute specification bias. 

Although we will discuss the topic of specification bias in Chapter 13, we caught a glimpse of it in Section 
7.7. If, for example, the true model is 


Y; = Ppi + BoX2; + B3X3; + ui 


but we mistakenly fit the model 


Y= by a oA oa l i (10.8.1) 
then it can be shown that (see Appendix 13A.1) 
E(bi2) = Bo + 3b32 ` i (10.8.2) 


where b,, = slope coefficient in the regression of X; on X,. Therefore, it is obvious from Eq. (10.8.2) that 
b, will be a biased estimate of B, as long as b;, is different from zero (it is assumed that £}, is different from 
zero; otherwise there is no sense in including X; in the original model).*? Of course, if b}, is zero, we have no 
multicollinearity problem to begin with. It is also clear from Eq. (10.8:2) that if both b}, and 8, are positive 
(or both are negative), E(b, ) will be greater than B,; hence, on the average b; will overestimate 6, leading 
to a positive bias. Similarly, if the product b;,B, is negative, on the average b,, will underestimate $, leading 
to a negative bias. 

From the preceding discussion it is clear that dropping a variable from the model to alleviate the problem 
of multicollinearity may lead to the specification bias. Hence the remedy may be worse than the disease in 
some situations because, whereas multicollinearity may prevent precise estimation of the parameters of the 
model, omitting a variable may seriously mislead us as to the true values of the parameters. Recall that OLS 
estimators are BLUE despite near collinearity. = 

4. Transformation of variables. Suppose we have time series data on consumption expenditure, income, 
and wealth. One reason for high multicollinearity between income and wealth in such data is that over time 
both the variables tend to move in the same direction. One way of minimizing this dependence is to proceed 
as follows. 

If the relation 


Y, = By + BoXn + B3X3¢ + uy i aii (10.8.3) 


holds at time ¢, it must also hold at time t — 1 because the origin of time is arbitrary anyway. Therefore, we 
have 


Y;-1 = By + BoX2 4-1 + B3X3,1-1 + ue-1 (10.8.4) 


33Note further that if b; does not approach zero as the sample size is increased indefinitely, then b; z will be not only biased 
but also inconsistent. 
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If we subtract Eq. (10.8.4) from Eq. (10.8.3), we obtain 
tp Ly) = PA O Pala — X34-1) + V; (10.8.5) 


where v, = u,— u,_,. Equation (10.8.5) is known as the first difference form because we run the regression not 
on the original variables but on the differences of successive values of the variables. 

The first difference regression model often reduces the severity of multicollinearity because, although the 
levels of X, and X, may be highly correlated, there is no a priori reason to believe that their differences will 
also be highly correlated. 

As we shall see in the chapters on time series econometrics, an incidental advantage of the first difference 
transformation is that it may make a nonstationary time series stationary. In those chapters we will see 
the importance of stationary time series. As noted in Chapter 1, loosely speaking, a time series, say, Y,, is 
stationary if its mean and variance do not change systematically over time. 

Another commonly used transformation in practice is the ratio transformation. Consider the model: 


Y, = Bi + BoXn + BsXy + uy; (10.8.6) 


where Y is consumption expenditure in real dollars, X, is GDP, and X, is total population. Since GDP and 
population grow over time, they are likely to be correlated. One “solution” to this problem is to express the 
model on a per capita basis, that is, by dividing Eq. (10.8.4) by X3, to obtain: 


mes xe =A(z -) +o (32) ++ (4 ) (10.8.7) 
3t 


Such a transformation may reduce oa in the original variables. 

But the first difference or ratio transformations are not without problems. For instance, the error term v, 
in Eq. (10.8.5) may not satisfy one of the assumptions of the classical linear regression model, namely, that 
the disturbances are serially uncorrelated. As we will see in Chapter 12, if the original disturbance term u, is 
serially uncorrelated, the error term v, obtained previously will in most cases be serially correlated. Therefore, 
the remedy may be worse than the disease. Moreover, there is a loss of one observation due to the differencing 
procedure, and therefore the degrees of freedom are reduced by one. In a small sample, this could be a factor 
one would wish at least to take into consideration. Furthermore, the first differencing procedure may not be 
appropriate in cross-sectional data where there is no logical ordering of the observations. 

Similarly, in the ratio model (10.8.7), the error term 


(x) 


will be heteroscedastic, if the original error term u, is homoscedastic, as we shall see in Chapter 11. Again, 
the remedy may be worse than the disease of collinearity. 

In short, one should be careful in using the first difference or ratio method of transforming the data to 
resolve the problem of multicollinearity. 

5. Additional or new data. Since multicollinearity is a sample feature, it is possible that in another sample 
involving the same variables collinearity may not be so serious as in the first sample. Sometimes simply 
increasing the size of the sample (if possible) may attenuate the collinearity problem. For example, in the 
three-variable model we saw that 


o2 


ete) 
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Now as the sample size increases, Io will generally increase. (Why?) Therefore, for any given r33, the 
variance of Ê will decrease, thus decreasing the standard error, which will enable us to estimate B, more 
precisely. 

As an illustration, consider the following regression of consumption expenditure Y on income X, and 
wealth X; based on 10 observations:** 


A 


Y,; = 24.377 + 0.8716X2;— 0.0349.X3; . 
t = (3.875) (2.7726) (—1.1595) R? = 0.9682 (10.8.8) 


The wealth coefficient in this regression not only has the wrong sign but is also statistically insignificant 
at the 5 percent level. But when the sample size was increased to 40 observations (micronumerosity?), the 
following results were obtained: 


A 


¥; = 2.0907 + 0.7299X;+ 0.0605X3; 
t = (0.8713) (6.0014) (2.0014) | R*=0.9672 (10.8.9) 


Now the wealth coefficient not only has the correct sign but also is statistically significant at the 5 percent 
level. 
Obtaining additional or “better” data is not always that easy, for as Judge et al. note: 


Unfortunately, economists seldom can obtain additional data without bearing large costs, much less choose the 
values of the explanatory variables they desire. In addition, when adding new variables in situations that are not 
controlled, we must be aware of adding observations that were generated by a process other than that associated 
with the original data set; that is, we must be sure that the economic structure associated with the new observations 
is the same as the original structure.” 


6. Reducing collinearity in polynomial regressions. In Section 7.10 we discussed polynomial regression 
models. A special feature of these models is that the explanatory variable(s) appears with various powers. 
Thus, in the total cubic cost function involving the regression of total cost on output, (output), and (output)’, 
as in Eq. (7.10.4), the various output terms are going to be correlated, making it difficult to estimate the 
various slope coefficients precisely.*° In practice though, it has been found that if the explanatory variable(s) 
is expressed in the deviation form (i.e., deviation from the mean value), multicollinearity is substantially 
reduced. But even then the problem may persist,” in which case one may want to consider techniques such 
as orthogonal polynomials.*° 

7. Other methods of remedying multicollinearity. Multivariate statistical techniqifes such as factor 
analysis and principal components or techniques such as ridge regression are often employed to “solve” 
the problem of multicollinearity. Unfortunately, these techniques are beyond the scope of this book, for they 
cannot be discussed competently without resorting to matrix algebra.” 


34 am indebted to the late Albert Zucker for providing the results given in the following regressions. 

35Judge et al., op. cit., p. 625. See also Section 10.9. 

36As noted, since the relationship between X, X2, and X? is nonlinear, polynomial regressions do not violate the assumption 
of no multicollinearity of the classical model, strictly speaking. 


37See R. A. Bradley and S. S. Srivastava, “Correlation and Polynomial Regression,” American Statistician, vol. 33, 1979, pp. 
11-14. 


38See Norman Draper and Harry Smith, Applied Regression Analysis, 2d ed., John Wiley & Sons, New York, 1981, pp. 
266-274. 

394 readable account of these techniques from an applied viewpoint can be found in Samprit Chatterjee and Bertram Price, 
Regression Analysis by Example, John Wiley & Sons, New York, 1977, Chapters 7 and 8. See also H. D. Vinod, “A Survey of 


Ridge Regression and Related Techniques for Improvements over Ordinary Least Squares,” Review of Economics and Statistics, 
vol. 60, February 1978, pp. 121-131. 
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10.9 Is Multicollinearity Necessarily Bad? Maybe Not, 
If the Objective Is Prediction Only 


It has been said that if the sole purpose of regression analysis is prediction or forecasting, then multicol- 
linearity is not a serious problem because the higher the R?, the better the prediction.“ But this may be 
so “. . . as long as the values of the explanatory variables for which predictions are desired obey the same 
near-exact linear dependencies as the original design [data] matrix X.”*! Thus, if in an estimated regression it 
was found that X, = 2X, approximately, then in a future sample used to forecast Y, X, should also be approxi- 
mately equal to 2X3, a condition difficult to meet in practice (see footnote 35), in which case prediction will 
become increasingly uncertain. Moreover, if the objective of the analysis is not only prediction but also 
reliable estimation of the parameters, serious multicollinearity will be a problem because we have seen that 
it leads to large standard errors of the estimators. 

In one situation, however, multicollinearity may not pose a serious problem. This is the case when R? is 
high and the regression coefficients are individually significant as revealed by the higher ¢ values. Yet, multi- 
collinearity diagnostics, say, the condition index, indicate that there is serious collinearity in the data. When 
can such a situation arise? As Johnston notes: 

This can arise if individual coefficients happen to be numerically well in excess of the true value, so that 
the effect still shows up in spite of the inflated standard error and/or because the true value itself is so large 
that even an estimate on the downside still shows up as significant.” 


10.10 An Extended Example: The Longley Data 


We conclude this chapter by analyzing the data collected by Longley. Although originally collected to 
assess the computational accuracy of least-squares estimates in several computer programs, the Longley data 
have become the workhorse to illustrate several econometric problems, including multicollinearity. The data 
are reproduced in Table 10.8. The data are time series for the years 1947—1962 and pertain to Y = number of 
people employed, in thousands; X, = GNP implicit price deflator; X, = GNP, millions of dollars; X, = number 
of people unemployed in thousands, X, = number of people in the armed forces, X; = noninstitutionalized 
population over 14 years of age; and X, = year, equal to 1 in 1947, 2 in 1948, and 16 in 1962. 

Assume that our objective is to predict Y on the basis of the six X variables. Using EViews6, we obtain the 
following regression results: 


40See R. C. Geary, “Some Results about Relations between Stochastic Variables: A Discussion Document,” Review of Interna- 
tional Statistical Institute, vol. 31, 1963, pp. 163-181. 

“Judge et al., op. cit., p. 619. You will also find on this page proof of why, despite collinearity, one can obtain better mean 
predictions if the existing collinearity structure also continues in the future samples. 

42For an excellent discussion, see E. Malinvaud, Statistical Methods of Econometrics, 2d ed., North-Holland Publishing 
Company, Amsterdam, 1970, pp. 220-221. 

43), Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, p. 249. 

44). Longley, “An Appraisal of Least-Squares Programs from the Point of the User,” journal of the American Statistical Associa- 
tion, vol. 62, 1967, pp. 819-841. 
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Table 10.8 Longley Data 


Observation Y Xi 
1947 60,323 830 
1948 61,122 885 
1949 60,171 882 
1950 61,187 895 
1951 63,221 962 
1952 63,639 981 
1953 64,989 990 
1954 63,761 1,000 
1955 66,019 1,012 
1956 67,857 1,046 
1957 68,169 1,084 
1958 66,513 1,108 
1959 68,655 1,126 
1960 69,564 1,142 
1961 69,331 1,157 
1962 70,551 1,169 


234,289 
259,426 
258,054 
284,599 
328,975 
346,999 
365,385 
363,112 
397,469 
419,180 
442,769 
444,546 
482,704 
502,601 
518,173 
554,894 


X3 


2,356 
2,325 
3,682 
3,351 
2,099 
1,932 
1,870 
3,578 
2,904 
2,822 
2,936 
4,681 
3,813 
3,931 
4,806 
4,007 


X4 
1,590 
1,456 
1,616 
1,650 
3,099 
3,594 
3,547 
3,350 
3,048 
2,857 
2,798 
2,637 
2,552 
2,514 
2,572 
2,827 


X5 


107,608 
108,632 
VOSAS 
110,929 
112,075 
113,270 
115,094 
116,219 
117,388 
118,734 
120,445 
121,950 
123,366 
125,368 
127,852 
130,081 
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Source: J. Longley, “An Appraisal of Least-Squares Programs from the Point of the User,” Journal of the American Statistical Association, vol. 62, 1967, 


pp. 819-841. 


Dependent Variable: Y 


Sample: 1947-1962 
Variable Coefficient std. ErEOK t-Statistic Prob. 
G -3482259. 890420.4 -3.910803 0.0036 
Xı 15.06187 84.91493 OPSETTE Qag 63 L 
X2 -0.035819 0.033491 -1.069516 ORE 
X -2.020230 0.488400 -4.136427 0.0025 
Xa -1.033227 OZNE IA -4.821985 0.0009 
Xs -0.051104 0.226073 -0.226051 0.8262 
Xe 1829 TSt 455.4785 4.015890 0.0030 
R-squared 0.995479 Mean dependent var. 65317.00 
Adjusted R-squared 0.992465 S.D. dependent var. 3511.968 
S.E. of regression 304.8541 Akaike info criterion 14.57718 
Sum squared resid. 836424.1 Schwarz criterion AN, StS 169 
Log likelihood -109.6174 F-statistic 3820.2863 
Durbin-Watson stat. 2.559488 Prob(F-statistic) 


0.000000 


A glance at these results would suggest that we have the collinearity problem, for the R? value is very high, 
but quite a few variables are statistically insignificant (X,. X,, and X,), a classic symptom of multicollinearity. 
To shed more light on this, we show in Table 10.9 the intercorrelations among the six regressors. 

This table gives what is called the correlation matrix. In this table the entries on the main diagonal (those 
running from the upper left-hand corner to the lower right-hand corner) give the correlation of one variable 
with itself, which is always 1 by definition, and the entries off the main diagonal are the pair-wise correlations 
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Table 10.9 Intercorrelations 


Xı X2 X3 X4 Xs X6 


Xı 1.000000 0.991589 0.620633 0.464744 0.979163 0.991149 
X2 0.991589 1.000000 - 0.604261 0.446437 0.991090 0.995273 
X3 0.620633 0.604261 1.000000 —0.177421 0.686552 0.668257 
X4 0.464744 0.446437 —0.177421 1.000000 0.364416 0.417245 
Xs 0.979163 0.991090 0.686552 0.364416 1.000000 0.993953 
Xe 0.991149 0.995273 0.668257 0.417245 0.993953 1.000000 


among the X variables. If you take the first row of this table, this gives the correlation of X, with the other X 
variables. For example, 0.991589 is the correlation between X, and X,, 0.620633 is the correlation between 
X, and X}, and so on. 

As you can see, several of these pair-wise correlations are quite high, suggesting that there may be a severe 
collinearity problem. Of course, remember the warning given earlier that such pair-wise correlations may be 
a sufficient but not a necessary condition for the existence of multicollinearity. 

To shed further light on the nature of the multicollinearity problem, let us run the auxiliary regressions, 
that is the regression of each X variable on the remaining X variables. To save space, we will present only 
the R? values obtained from these regressions, which are given in Table 10.10. Since the R°? values in the 
auxiliary regressions are very high (with the possible exception of the regression of X,) on the remaining X 
variables, it seems that we do have a serious collinearity problem. The same information is obtained from the 
tolerance factors. As noted previously, the closer the tolerance factor is to zero, the greater is the evidence of 
collinearity. 


Table 10.10 R? Values from the Auxiliary Regressions 


Dependent Variable R? Value Tolerance (TOL) = 1 — R? 


x 0.9926 0.0074 
X2 0.9994 0.0006 
X3 0.9702 0.0298 
X4 0.7213 0.2787 
Xs 0.9970 0.0030 


X6 0.9986 0.0014 


Applying Klein’s rule of thumb, we see that the R” values obtained from the auxiliary regressions exceed 
the overall R? value (that is, the one obtained from the regression of Y on all the X variables) of 0.9954 in 3 out 
of 6 auxiliary regressions, again suggesting that indeed the Longley data are plagued by the multicollinearity 
problem. Incidentally, applying the F test given in Eq. (10.7.3) the reader should verify that the R? values 
given in the preceding tables are all statistically significantly different from zero. 

We noted earlier that the OLS estimators and their standard errors are sensitive to small changes in the 
data. In Exercise 10.32 the reader is asked to rerun the regression of Y on all the six X variables but drop the 
last data observations, that is, run the regression for the period 1947—1961. You will see how the regression 
results change by dropping just a single year’s observations. 

Now that we have established that we have the multicollinearity problem, what “remedial” actions can 
we take? Let us reconsider our original model. First of all, we could express GNP not in nominal terms, but 
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in real terms, which we can do by dividing nominal GNP by the implicit price deflator. Second, since nonin- 
stitutional population over 14 years of age grows over time because of natural population growth, it will be 
highly correlated with time, the variable X, in our model. Therefore, instead of keeping both these variables, 
we will keep the variable X, and drop X,. Third, there is no compelling reason to include X;, the number of 
people unemployed; perhaps the unemployment rate would have been a better measure of labor market condi- 
tions. But we have no data on the latter. So, we will drop the variable X}. Making these changes, we obtain 
the following regression results (RGNP = real GNP):° 


Dependent Variable: Y 
Sample: 1947-1962 


Variable Coefficient Std. Error t-Statistic Prob. 
(= 65720.37 10624.81 6.185558 0.0000 
RGNP -9,736496 pe e a Ne 5.434671 i 0.0002 
Xa -0.687966 0.322238 ~2.134965 0.0541 

Xs -0.299537 OT TAITEL 1 =2.112965 0.0562 
R-~squared 0.981404 Mean dependent var. 65317200 
Adjusted R-squared 07976755 S.D. dependent var. 35117968 
S.E. of regression 535.4492 Akaike info criterion TS 664 
Sum squared resid. 3440470: Schwarz criterion 15180955 
Log likelihood -120.9313 F-statistic 21a R0972 


Durbin-Watson stat. 1.654069 Prob(F-statistic) 0.000000 


Although the R? value has declined slightly compared with the original R?, it is still very high. Now all the 
estimated coefficients are significant and the signs of the coefficients make economic sense. 

We leave it for the reader to devise alternative models and see how the results change. Also keep in mind 
the warning sounded earlier about using the ratio method of transforming the data to alleviate the problem of 
collinearity. We will revisit this question in Chapter 11. 


Summary and Conclusions 


1. One of the assumptions of the classical linear regression model is that there is no multicollinearity 
among the explanatory variables, the X’s. Broadly interpreted, multicollinearity refers to the situation 
where there is either an exact or approximately exact linear relationship among the X variables. 

2. The consequences of multicollinearity are as follows: If there is perfect collinearity among the X’s, 
their regression coefficients are indeterminate and their standard errors are not defined. If collinearity 
is high but not perfect, estimation of regression coefficients is possible but their standard errors tend to 
be large. As a result, the population values of the coefficients cannot be estimated precisely. However, 
if the objective is to estimate linear combinations of these coefficients, the estimable functions, this can 
be done even in the presence of perfect multicollinearity. 


“The coefficient of correlation between X, and X, is about 0.9939, a very high correlation indeed. 
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3. Although there are no sure methods of detecting collinearity, there are several indicators of it, which 
are as follows: 

(a) The clearest sign of multicollinearity is when R? is very high but none of the regression coefficients 
is Statistically significant on the basis of the conventional t test. This case is, of course, extreme. 

(b) In models involving just two explanatory variables, a fairly good idea of collinearity can be 
obtained by examining the zero-order, or simple, correlation coefficient between the two variables. 
If this correlation is high, multicollinearity is generally the culprit. 

(c) However, the zero-order correlation coefficients can be misleading in models involving more than 
two X variables since it is possible to have low zero-order correlations and yet find high multicol- 
linearity. In situations like these, one may need to examine the partial correlation coefficients. 

(d) If R? is high but the partial correlations are low, multicollinearity is a possibility. Here one or more 
variables may be superfluous. But if R” is high and the partial correlations are also high, multicol- 
linearity may not be readily detectable. Also, as pointed out by C. Robert Wichers, Krishna Kumar, 
John O’Hagan, and Brendan McCabe, there are some statistical problems with the partial corre- 
lation test suggested by Farrar and Glauber. 

(e) Therefore, one may regress each of the X; variables on the remaining X variables in the model and 
find out the corresponding coefficients of determination R?. A high R? would suggest that X; is 
highly correlated with the rest of the X’s. Thus, one may drop that X; from the model, provided it 
does not lead to serious specification bias. 

4. Detection of multicollinearity is half the battle. The other half is concerned with how to get rid of 
the problem. Again there are no sure methods, only a few rules of thumb. Some of these rules are as 
follows: (1) using extraneous or prior information, (2) combining cross-sectional and time series data, 
(3) omitting a highly collinear variable, (4) transforming data, and (5) obtaining additional or new data. 
Of course, which of these rules will work in practice will depend on the nature of the data and severity 
of the collinearity problem. 

5. We noted the role of multicollinearity in prediction and pointed out that unless the collinearity structure 
continues in the future sample it is hazardous to use the estimated regression that has been plagued by 
multicollinearity for the purpose of forecasting. 

6. Although multicollinearity has received extensive (some would say excessive) attention in the liter- 
ature, an equally important problem encountered in empirical research is that of micronumerosity, 
smallness of sample size. According to Goldberger, “When a research article complains about multi- 
collinearity, readers ought to see whether the complaints would be convincing if “micronumerosity” 
were substituted for “multicollinearity”’*° He suggests that the reader ought to decide how small n, the 
number of observations, is before deciding that one has a small-sample problem, just as one decides 
how high an R? value is in an auxiliary regression before declaring that the collinearity problem is very 
severe. 


Multiple Choice Questions 


1. One of the assumptions of CLRM is that the number of observations in the sample must be greater than 
the number of 
a. Regressors 
b. Regressands 


46Goldberger, op. cit., p. 250. 
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c. Dependent variable 

d. Dependent and independent variables 
Perfect multicollinearity between variables X,, X, and X, can be expressed using constants A, , Aj, and 
A, such that 

a. A,X, + A,X + A,X; = 0; where A,, Az, and A; are all equal to zero simultaneously 

b. A,X, + ÀX; + ÀX; + v = 0; where v is the stochastic term and À}, A>, and À; are not all equal to 

zero simultaneously i aa 
c. A,X; + ÀX, + ÀX; = 0; where A,, A>, and A; are not all equal to zero simultaneously 
d. ÀX; + A,X, + A,X; + v = Q; where v is the stochastic term and A,, A>, and A, are all equal to zero 
simultaneously 

In a regression model Y; = B, + B,X>; + B3X3; + u;, F-test is seen to be statistically significant at less than 
5 percent level of significance but the coefficients 8, and 8, are seen to be statistically insignificant. 
This means that the 

a. Two coefficients are highly correlated 

b. Two variables are highly correlated 

c. Two variables are perfectly correlated 

d. Two variables are not correlated 
If for a set of explanatory variables X, and X}, the coefficients of correlation is equal to 1, this means 
that between X, and X;, there exists 

a. No collinearity 

b. Low level of collinearity 

c. Perfect collinearity 

d. Very high collinearity 


. If there exists high multicollinearity, then the regression coefficients are — 


a. Determinate 

b. Indeterminate 

c. Infinite values 

d. Small negative value 


. If multicollinearity is perfect in a regression model then the regression coefficients of the explanatory 


variables are 

a. Determinate 

b. Indeterminate 

c. Infinite values 

d. Small negative value 
If multicollinearity is perfect in a regression model the standard errors of the regression coefficients are 

a. Determinate 

b. Indeterminate 

c. Infinite values 

d. Small negative value 


. The coefficients of explanatory variables in a regression model with less than perfect multicollinearity 


cannot be estimated with great precision and accuracy. This statement is 
a. Always true 
b. Always false 
c. Sometimes true 
d. Nonsense statement 


10. 


11. 


12; 


13: 


14. 


15; 


16. 


17. 
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In a regression model with multicollinearity being very high, the estimators 
a. Are unbiased 
b. Are consistent 
c. Standard errors are correctly estimated 
d. All of the above 
Micronumerosity in a regression model according to Goldberger refers to 
a. A type of multicollinearity 
b. Sample size n being zero 
c. Sample size n being slightly greater than the number of parameters to be estimated 
d. Sample size n being just smaller than the number of parameters to be estimated 
Multicollinearity is essentially a 
a. Sample phenomenon 
b. Population phenomenon 
c. Botha and b 
d. Either a or b 


Which of the following statements is NOT TRUE about a regression model in the presence of multicol- 


linearity 
a. tratio of coefficients tends to be statistically insignificant 
b. R? is high 
c. OLS estimators are not BLUE 
d. OLS estimators are sensitive to small changes in the data 
Which of these is NOT a symptom of multicollinearity in a regression model 
a. High R? with few significant t ratios for coefficients 
b. High pair-wise correlations among regressors 
c. High R? and all partial correlation among regressors 
d. VIF of a variable is below 10 
A sure way of removing multicollinearity from the model is to 
a. Work with panel data 
b. Drop variables that cause multicollinearity in the first place 
c. Transform the variables by first differencing them 
d. Obtaining additional sample data 
Assumption of ‘No multicollinearity’ means the correlation between the regresand and regressor is 
a. High 
b. Low 
¢. Zero 
d. Any of the above 
An example of a perfect collinear relationship is a quadratic or cubic function. This statement is 
a. True 
b. False 
c. Depends on the functional form 
d. Depends on economic theory 
Multicollinearity is limited to 
a. Cross-section data 
b. Time series data 
c. Pooled data 
d. All of the above 
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18. Multicollinearity does not hurt is the objective of the estimation is 
a. Forecasting only 
b. Prediction only 
c. Getting reliable estimation of parameters 
d. Prediction or forecasting 
19. As a remedy to multicollinearity, doing this may lead to specification bias 
a. Transforming the variables 
b. Adding new data 
c. Dropping one of the collinear variables 
d. First differencing the successive values of the variable 
20. F test in most cases will reject the hypothesis that the partial slope coefficients are simultaneously equal 
to zero. This happens when 
a. Multicollinearity is present 
b. Multicollinearity is absent 
c. Multicollinearity may be present OR may not be present 
d. Depends on the F-value 


Exercises 


Questions 


10.1. In the k-variable linear regression model there are k normal equations to estimate the k unknowns. 
These normal equations are given in Appendix C. Assume that X, is a perfect linear combination of 
the remaining X variables. How would you show that in this case it is impossible to estimate the k 
regression coefficients? 


Table 10.11 

Y X2 X3 

=10 1 1 
=g 2 3 a 

—6 3 5 

—4 4 7 

—2 5 9 

0 6 11 

2 7 13 

4 8 15 

6 9 (l 

8 10 19 

10 11 21 


10.2. Consider the set of hypothetical data in Table 10.11. Suppose you want to fit the model 


Y; = By + BoX2; + B3X3; + ui 
to the data. 


10.3. 


10.4. 


10.5; 


10.6. 


10.7. 
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a. Can you estimate the three unknowns? Why or why not? 

b. If not, what linear functions of these parameters, the estimable functions, can you estimate? Show 
the necessary calculations. 

Refer to the child mortality example discussed in diia 8 (Example 8.1). The example there involved 

the regression of the child mortality (CM) rate on per capita GNP (PGNP) and female literacy rate 

(FLR). Now suppose we add the variable, total fertility rate (TFR). This gives the following regression 

results. 


Dependent Variable: CM 


Variable Coefficient Std. Error t-Statistic PrOD. 
E 168.3067 32.89165 oye dll 7/1001) 0.0000 
PGNP -0.005511 0.001878 -2.934275 0.0047 
FLR -1.768029 0.248017 -7.128663 0.0000 
TFR 12.86864 -A190533 2070883 0.0032 
R-squared 0.747372 Mean dependent var. 141.5000 
Adjusted R-squared 0.734740 S.D. dependent var. Toe ITSO 
S.E. of regression 39.13127 Akaike info criterion 10.23218 
Sum squared resid. 91875.38 Schwarz criterion IL) Sioyieil il 
Log likelihood -323.4298 F-statistic 59.16767 


Durbin-Watson stat. 2.170318 Prob(F-statistic) 0.000000 


a. Compare these regression results with those given in Eq. (8.1.4). What changes do you see? How 
do you account for them? 

b. Is it worth adding the variable TFR to the model? Why? 

c. Since all the individual ¢ coefficients are statistically significant, can we say that we do not have a 
collinearity problem in the present case? 

If the relation A,X,; + A,X; + A3X3,; = 0 holds true for all values of À}, Az, and Az, estimate r} 53.7) 3.2, 

and r, 3 ,. Also find Ri 43, R53, and Rj ,,. What is the degree of multicollinearity in this situation? 

Note: R? is the coefficient of determination in the regression of Y on X, and X;. Other R? values are 

to be interpreted similarly. 

Consider the following model: 


Y, = Bi + BoX; + B3Xi-1 + BaXt-2 + BsX1-3 + BoXr—4 + ur 


where Y = consumption, X = income, and ¢ = time. The preceding model postulates that consumption 
expenditure at time f¢ is a function not only of income at time f but also of income through previous 
periods. Thus, consumption expenditure in the first quarter of 2000 is a function of income in that 
quarter and the four quarters of 1999. Such models are called distributed lag models, and we shall 
discuss them in a later chapter. 

a. Would you expect multicollinearity in such models and why? 

b. If collinearity is expected, how would you resolve the problem? 

Consider the illustrative example of Section 10.6 (Example 10.1). How would you reconcile the 
difference in the marginal propensity to consume obtained from Egs. (10.6.1) and (10.6.4)? 

In data involving economic time series such as GNP, money supply, prices, income, unemployment, 
etc., multicollinearity is usually suspected. Why? 
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10.8. 


1019: 


10.10. 


10.11. 


10.12. 


Suppose in the model 
Y; = By + a + a + ui 


that r,3 the coefficient of correlation between X, and X}, is zero. Therefore, someone suggests that 
you run the following regressions: 


Y; = a + 02X7j + Uii 

Y; = yı + ¥3X3i + Uzi 
Will & = fy and 7 = 83? Why? 
Will ; equal â; or 7; or some combination thereof? 
Will var (B2) = var (@) and var (ĝ;) = var (73)? 
Refer to the illustrative example of Chapter 7 where we fitted the Cobb-Douglas production function 
to the manufacturing sector of all 50 states and the District of Columbia for 2005. The results of the 
regression given in Eq. (7.9.4) show that both the labor and capital coefficients are individually statis- 
tically significant. 
a. Find out whether the variables labor and capital are highly correlated. 
b. If your answer to (a) is affirmative, would you drop, say, the labor variable from the model and 

regress the output variable on capital input only? 


c. If you do so, what kind of specification bias is committed? Find out the nature of this bias. 
Refer to Example 7.4. For this problem the correlation matrix is as follows: 


x; Xo x 
Xi 1 0.9742 0.9284 
x? . os 0.9872 


Apai | 1.0 


a. “Since the zero-order correlations are very high, there must be serious multicollinearity.” Comment. 

b. Would you drop variables X? and X? from the model? 

c. If you drop them, what will happen to the value of the coefficient of X;? > 

Stepwise regression. In deciding on the “best” set of explanatory variables for a regression model, 

researchers often follow the method of stepwise regression. In this method one proceeds either by 

introducing the X variables one at a time (stepwise forward regression) or by including all the 

possible X variables in one multiple regression and rejecting them one at a time (stepwise backward 

regression). The decision to add or drop a variable is usually made on the basis of the contribution of 

that variable to the ESS, as judged by the F test. Knowing what you do now about multicollinearity, 

would you recommend either procedure? Why or why not?” 

State with reason whether the following statements are true, false, or uncertain: 

a. Despite perfect multicollinearity, OLS estimators are BLUE. 

b. In cases of high multicollinearity, it is not possible to assess the individual significance of one or 
more partial regression coefficients. 


c. If an auxiliary regression shows that a particular R? is high, there is definite evidence of high 
collinearity. 


“See if your reasoning agrees with that of Arthur S. Goldberger and D. B. Jochems, “Note on Stepwise Least-Squares,” Jour- 
nal of the American Statistical Association, vol. 56, March 1961, pp. 105-110. 
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d. High pair-wise correlations do not suggest that there is high multicollinearity. 

e. Multicollinearity is harmless if the objective of the analysis is prediction only. 

f. Ceteris paribus, the higher the VIF is, the larger the variances of OLS estimators. 

g. The tolerance (TOL) is a better measure of multicollinearity than the VIF. 

h. You will not obtain a high R7 value in a multiple regression if all the partial slope coefficients are 
individually statistically insignificant on the basis of the usual ż test. 

i. In the regression of Y on X, and X;, suppose there is little variability in the values of X}. This would 
increase var (Bs). In the extreme, if all X, are identical, var (Bs) i is infinite. 

10.13. a. Show that if r; = 0 fori=2,3,...,k then 


Ry23:0k= 9 
b. What is the importance of this finding for the regression of variable X,(= Y) on X3, X3,..., X}? 
10.14. Suppose all the zero-order correlation coefficients of X,(= Y), X,,..., X, are equal to r 


a. What is the value of R?,, ,? 
b. What are the values of the first-order correlation coefficients? 
"10.15. In matrix notation it can be shown (see Appendix C) that 


B = COIN 
a. What happens to B when there is perfect collinearity among the X’s? 


b. How would you know if perfect collinearity exists? 
“10.16. Using matrix notation, it can be shown 


var—cov (Ê) = o?2(X’X)! 


What happens to this var—cov matrix: 

a. When there is perfect multicollinearity? 

b. When collinearity is high but not perfect? 
"10.17. Consider the following correlation matrix: 


X: X3 Xk 

Ao 1 r3 V2 
R= Xa l F3k 
X sh a a | 


Describe how you would find out from the correlation matrix whether (a) there is perfect collinearity, 
(b) there is less than perfect collinearity, and (c) the X’s are uncorrelated. 
Hint: You may use IRI to answer these questions, where IRI denotes the determinant of R. 

*10.18. Orthogonal explanatory variables. Suppose in the model 


Y; = By + b2Xzi + P3X3i +*+ + BrXri + ui 


X, to X, are all uncorrelated. Such variables are called orthogonal variables. If this is the case: 
a. What will be the structure of the (X’X) matrix? 

b. How would you obtain Ê = (X’X)!X’y? 

c. What will be the nature of the var—cov matrix of 6? 


*Optional. 
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d. Suppose you have run the regression and afterward you want to introduce another orthogonal 
variable, say, X;,, into the model. Do you have to recompute all the previous coefficients 8; to By? 
Why or why not? 
10.19. Consider the following model: 


GNP, = Bi + BoM; a B3M;-1 SF baM; = M;-1) + Uy 


where GNP, = GNP at time t, M, = money supply at time t, M,_; = money supply at time (¢ — 1), and 
(M, - M,_,) = change in the money supply between time ¢ and time (t — 1). This model thus postulates 
that the level of GNP at time ¢ is a function of the money supply at time ¢ and time (t — 1) as well as 
the change in the money supply between these time periods. 
a. Assuming you have the data to estimate the preceding model, would you succeed in estimating all 
the coefficients of this model? Why or why not? 

b. If not, what coefficients can be estimated? 
c. Suppose that the B,M,_, terms were absent from the model. Would your answer to (a) be the same? 
d. Repeat (c), assuming that the terms B,M, were absent from the model. 

10.20. Show that Eqs. (7.4.7) and (7.4.8) can also be expressed as 


3, = Er E) ~ (Cees) Laas) 
(0 x3,) (0 33;)(0 i r33) 
Bs = © YixX3i) OE Fa) c po YiXzi) (o> 25X31) 
(Erz) (0 23,) (1 al r33) 
where rz; is the coefficient of correlation between X, and X3. : 
10.21. Using Egs. (7.4.12) and (7.4.15), show that when there is perfect collinearity, the variances of 
Bz and $; are infinite. 
10.22. Verify that the standard errors of the sums of the slope coefficients estimated from Eqs. (10.5.6) and 
(10.5.7) are, respectively, 0.1549 and 0.1825. (See Section 10.5.) 


10.23. For the k-variable regression model, it can be shown that the variance of the kth (k = 2, 3, ..., K) partial 
regression coefficient given in Eq. (7.5.6) can also be expressed as" 


2 2 
my = CS ) ~ 


n—ko? \1—R? 


where a, = variance of Y, oj = variance of the kth explanatory variable, R? = R? from the regression 

of X,on the remaining X variables, and R? = coefficient of determination from the multiple regression, 

that is, regression of Y on all the X variables. 

a. Other things the same, if ae increases, what happens to var (Bx)? What are the implications for the 
multicollinearity problem? 

b. What happens to the preceding formula when collinearity is perfect? 

c. True or false: “The variance of fy decreases as R? rises, so that the effect of a high R? can be offset 
by a high R?.” 


“This formula is given by R. Stone, “The Analysis of Market Demand,” Journal of the Royal Statistical Society, vol. B7, 1945, 


p. 297. Also recall Eq. (7.5.6). For further discussion, see Peter Kennedy, A Guide to Econometrics, 2d ed., The MIT Press, 
Cambridge, Mass., 1985, p. 156. 
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10.24. From the annual data for the U.S. manufacturing sector for 1899-1922, Dougherty obtained the 
following regression results:" 


log Y = 2.81 — 053logK+ 0.91 log + 0.0471 
se = (1.38) (0.34) (0.14) (0.021) R 
R?=0.97 F=189.8 


where Y = index of real output, K = index of real capital input, L = index of real labor input, t = time 
or trend. 


Using the same data, he also obtained the following regression: 


fog (Y/L) = —0.11 + 0.11 log (K/L) + 0.006¢ 
se= (0.03) (0.15) (0.006) (2) 
R=065 F=195 


a. Is there multicollinearity in regression (1)? How do you know? 
b. In regression (1), what is the a priori sign of log K? Do the results conform to this expectation? 
Why or why not? 

c. How would you justify the functional form of regression (1)? (Hint: Cobb-Douglas production 
function.) 

. Interpret regression (1). What is the role of the trend variable in this regression? 

. What is the logic behind estimating regression (2)? 

. If there was multicollinearity in regression (1), has that been reduced by regression (2)? How do 
you know? ” . 

g. If regression (2) is a restricted version of regression (1), what restriction is imposed by the author? 
(Hint: returns to scale.) How do you know if this restriction is valid? Which test do you use? Show 
all your calculations. 

h. Are the R? values of the two regressions comparable? Why or why not? How would you make 
them comparable, if they are not comparable in the present form? 

10.25. Critically evaluate the following statements: 

a. “In fact, multicollinearity is not a modeling error. It is a condition of deficient data.” 

b. “If it is not feasible to obtain more data, then one must accept the fact that the data one has contain 
a limited amount of information and must simplify the model accordingly. Trying to estimate 
models that are too complicated is one of the most common mistakes among inexperienced applied 
econometricians.”* 

c. “It is common for researchers to claim that multicollinearity is at work whenever their hypoth- 
esized signs are not found in the regression results, when variables that they know a priori to be 
important have insignificant ¢ values, or when various regression results are changed substantively 
whenever an explanatory variable is deleted. Unfortunately, none of these conditions is either 
necessary or sufficient for the existence of collinearity, and furthermore none provides any useful 


So & 


“Christopher Dougherty, Introduction to Econometrics, Oxford University Press, New York, 1992, pp.159-160. 

™Samprit Chatterjee, Ali S. Hadi, and Bertram Price, Regression Analysis by Example, 3d ed., John Wiley & Sons, New York, 
2000, p. 226. 

tRussel Davidson and James G. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 
1993, p. 186. 
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suggestions as to what kind of extra information might be required to solve the estimation problem 
gn present.” 
d. “...any time series regression containing more than four independent variables results in garbage.” 


Empirical Exercises 


10.26. Klein and Goldberger attempted to fit the following regression model to the U.S. economy: 
Yi = By + BoX2 + B3X3i + BaXai + ui 


where Y = consumption, X, = wage income, X, = nonwage, nonfarm income, and X, = farm income. 
But since X,, X3, and X, are expected to be highly collinear, they obtained estimates of 6, and B, from 
cross-sectional analysis as follows: 


Table 10.12 

Year Y X2 X3 X4 Year ay = X2 X3 X4 
1936 62.8 43.41 17.10 3.96 1946 95.7 .76.73 28.26 . 9.76 
1937 65.0 46.44 18.65 5.48 1947 98.3 75.91 27.91 9.31 
1938 63.9 44.35 17.09 4.37 1948 100.3 77.62 32.30 9.85 
1939 67.5 47.82 19.28 4.51 1949 103.2 78.01 31.39 721 
1940 71.3 51.02 23.24 4.88 1950 108.9 83.57 35.61 7.39 
1941 76.6 58.71 28.11 6.37 1951 108.5 90.59 37.58 7.98 
1945* 86.3 87.69 30.29 8.96 1952 111.4 95.47 35.17 7.42 


*The data for the war years 1942-1944 are missing. The data for other years are billions of 1939 dollars. 


Source: L. R. Klein and A. S. Goldberger, An Economic Model of the United States, 1929-1952. North Holland Publishing Company, Amsterdam, 
1964, p. 131. 


B = 0.758, and B, = 0. Be Using these estimates, they reformulated their consumption function 
as follows: 


Y; = By + Bo( Xz; + 0.75.X3; + 0.625.X4;) + u; = By + BoZ; + ui 


where Z; = Xz; + 0.75X3; + 0.625X4; i 
a. Fit the modified model to the data in Table 10.12 and obtain estimates of Bı to B4- 
b. How would you interpret the variable Z? 
10.27. Table 10.13 gives data on imports, GDP, and the Wholesale Price Index (WPI) for India over the 
period 1980-81 to 2008-09. You are asked to consider the following model: 


In Imports, = By ae Bo In GDP, T B; In CPI, + uy; 


a. Estimate the parameters of this model using the data given in the table. 
b. Do you suspect that there is multicollinearity in the data? 
c. Regress: (1) In Imports, = A, + A, In GDP, 
(2) In Imports, = B; + B, In CPI, 
(3) In GDP, = C, + C, In CPI, 
On the basis of these regressions, what can you say about the nature of multicollinearity in the data? 


“Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p. 187. 


“This quote attributed to the late econometrician Zvi Griliches, is obtained from Ernst R. Berndt, The Practice of Econometrics: 
Classic and Contemporary, Addison Wesley, Reading, Mass., 1991, p. 224. 
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d. Suppose there is multicollinearity in the data but ĝ» and ĝ are individually significant at the 5 percent 
level and the overall F test is also significant. In this case should we worry about the collinearity 


Imports, GDP at market price and WPI, 1980-81 to 2008-09 


problem? 
Table 10.13 

Year Imports 
1980-81 12,549 
1981-82 13,608 
1982-83 14,293 
1983-84 15,831 
1984-85 17,134 
1985-86 19,658 
1986-87 20,096 
1987-88 22,244 
1988-89 28235 
1989-90 35,328 
1990-91 43,193 
1991-92 47,851 
1992-93 63,375 
1993-94 73,101 


GDP 


143,762 
168,600 
188,262 
219,496 
245,515 
277,99 
3117 
354,343 
421,567 
486,179 
568,674 
653,117 
748,367 
859,220 


WPI 


ae 
40 
4] 
45 
49 
51 
54 
58 
62 
67 
74 
84 
92 
100 


Year 


1994-95 
1995-96 
1996-97 
1997-98 
1998-99 
1999-00 
2000-01 
2001-02 
2002-03 
2003-04 
2004-05 
2005-06 
2006-07 
2007-08 


2008-09 


Imports 


89,971 
122,678 
138,920 
154,176 
178,332 
215,236 
230,873 
245,200 
297,206 
359,108 
501,065 
660,409 
840,506 

Poni 


1,374,436 


GDP 


1,012,770 
1,188,012 
1,368,209 
1,522,547 
1,740,985 
1,936,831 
2,089,500 
2,271,984 
2,454,561 
2,754,620 
3,149,407 
3,706,473 
4,283,979 
4,947,857 


5,574,448 


WPI 


113 
122 
127 
133 
141 
145 
156 
161 
167 
176 
187 
196 
206 
216 


234 


Source: Handbook of Statistics on Indian Economy, 2009-10, RB1, Mumbai and Handbook of Industrial Policy and Statistics—2007-08. Office of 


Economic Advisor. Govt. of India. 


Note: Imports and GDP (at market prices) are measured in Rs. Crore. WPI is at 1993-94 base year. 


10.28. Refer to Exercise 7.19 about the demand function for potatoes in India. 
a. Using the log-linear, or double-log, model, estimate the various auxiliary regressions. How many 

are there? 

b. From these auxiliary regressions, how do you decide which regressor(s) is highly collinear? Which 

test do you use? Show the details of your calculations. 

c. If there is significant collinearity in the data, which variable(s) would you drop to reduce the 


severity of the collinearity problem? If you do that, what econometric problems do you face? 


d. Do you have any suggestions, other than dropping variables, to ameliorate the collinearity problem? 
Explain. 

10.29. Table 10.14 gives data on new passenger cars sold in the United States as a function of several 
variables. 

a. Develop a suitable linear or log—linear model to estimate a demand function for automobiles in the 
United States. 

b. If you decide to include all the regressors given in the table as explanatory variables, do you expect to 
face the multicollinearity problem? Why? 

c. If you do expect to face the multicollinearity problem, how will you go about resolving the problem? 
State your assumptions clearly and show all the calculations explicitly. 
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Table 10.14 Passenger Car Data 


Year 


197] 
1972 
1973 
1974 
1275 
1976 
1977 
1978 
1979 
1980 
1981 
1982 
1983 
1984 
1985 
1986 


Y X2 X3 X4 Xs X6 
10,227 112.0 Trs 776.8 4.89 79,367 
10,872 111.0 125.3 839.6 4.55 82,153 
11,350 UEY 133.1 949.8 7.38 85,064 

8,775 V7.0 © 147.7 1,038.4 8.61 . 86,794 
8,539 127-6 1612 1,142.8 6.16 85,846 
9,994 135.7 170.5 1,252.6 5.22 88,752 
11,046 142.9` 181.5 1,379.3 5.50 92,017 
11,164 153.8 193.3 i wie Na = 7.78 96,048 
10,559 166.0 ` 217.7 R293 10.25 98,824 
8,979 17933 247.0 1,918.0 11.28 T 9903 
8,535 190.2 272.3 227.6 13.73 100,397 
7,980 197.6 286.6 2,261.4 - 11.20 99,526 
SHIA 202.6 297.4 2,428.1 8.69 100,834 
10,394 208.5 307.6 2,670.6 9.65 105,005 
11,039 215.2 318.5 2,841.1 7.75 107,150 
11,450 224.4 323.4 3,022.1 6.31 109,597 


Y = new passenger cars sold (thousands), seasonally unadjusted. 
X2 = new cars, Consumer Price Index, 1967 = 100, seasonally unadjusted. 
X3 = Consumer Price Index, all items, all urban consumers, 1967 = 100, seasonally unadjusted. 
X, = the personal disposable income (PDI), billions of dollars, unadjusted for seasonal variation. 
Xs = the interest rate, percent, finance company paper placed directly. 
X6 = the employed civilian labor force (thousands), unadjusted for seasonal variation. 


Source: Business Statistics, 1986, A Supplement to the Current Survey of Business, U.S. Department of Commerce. 


10.30. 


10.31. 


10.32. 


To assess the feasibility of a guaranteed annual wage (negative income tax), the Rand Corporation 

conducted a study to assess the response of labor supply (average hours of work) to increasing hourly 

wages. The data for this study were drawn from a national sample of 6,000 households with a male 

head earning less than $15,000 annually. The data were divided into 39 demographic groups for 

analysis. These data are given in Table 10.15. Because data for four demographic groups were missing 

for some variables, the data given in the table refer to only 35 demographic groups. The definitions of 

the various variables used in the analysis are given at the end of the table. ~ 

a. Regress average hours worked during the year on the variables given in the table and interpret your 
regression. 

b. Is there evidence of multicollinearity in the data? How do you know? 

c. Compute the variance inflation factors (VIF) and TOL measures for the various regressors. 

d. If there is the multicollinearity problem, what remedial action, if any, would you take? 

e. What does this study tell about the feasibility of a negative income tax? 

Table 10.16 gives data on the crime rate in 47 states in the United States for 1960. Try to develop a 

suitable model to explain the crime rate in relation to the 14 socioeconomic variables given in the 

table. Pay particular attention to the collinearity problem in developing your model. 

Refer to the Longley data given in Section 10.10. Repeat the regression given in the table there by 

omitting the data for 1962; that is, run the regression for the period 1947-1961. Compare the two 

regressions. What general conclusion can you draw from this exercise? 


“D. H. Greenberg and M. Kosters, Income Guarantees and the Working Poor, Rand Corporation, R-579-OEO, December 1970. 
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Table 10.15 Hours of Work and Other Data for 35 Groups 


Observation Hours 


1 2157 
2 2174 
3 2062 
4 2111 
5 2134 
6 2185 
7 2210 
8 2105 
9 2267 
10 2205 
11 2121 
12 2109 
is 2108 
14 2047 
15 2174 
16 2067 
WZ 2159 
18 2257 
19 1985 
20 2184 
21 2084 
22 2051 
23 2127 
24 2102 
26 2098 
26 2042 
27 2181 
28 2186 
29 2188 
30 2077 
31 2196 
32 2093 
33 2173 
34 2179 
35 2200 


Rate 
2.905 
2.970 
2.350 
2.511 
2.791 
3.040 
3.222 
2.493 
2.838 
2.356 
2.922 
2.499 
2.796 
2.453 
3.582 
2.909 
2.511 
2.516 
1.423 
3.636 
2.983 
2,573 
3.262 
3.234 
2.280 
2.304 
2.912 
3.015 
3.010 
1.901 
3.009 
1.899 
2.959 
2.971 
2.980 


ERSP 


1121 
1128 
1214 
1203 
1013 
1135 
1100 
1180 
1298 

885 
1251 
1207 
1036 
1213 
1141 
1805 
1075 
1093 

553 
1091 
1327 
1194 
1226 
1188 

973 
1085 
1072 
1122 

990 

350 

947 

342 
1116 
1128 
1126 


Notes: Hours = average hours worked during the year, 
Rate = average hourly wage (dollars). 

ERSP = average yearly earnings of spouse (dollars). 

ERNO = average yearly earnings of other family members (dollars). 

NEIN = average yearly nonearned income. 

Assets = average family asset holdings (bank account, etc.) (dollars). 

Age = average age of respondent. 

Dep = average number of dependents. 

School = average highest grade of school completed. 


Source: D. H. Greenberg and M. Kosters, Income Guarantees and the Working Poor, Rand Corporation, R-579-OEO, December 1970. 


ERNO NEIN Assets 


291 
301 
326 

49 
594 
287 
295 
310 
252 
264 
328 
347 
300 
297 
414 
290 
289 
176 
381 
291 
331 
279 
314 
414 
364 
328 
304 

30 
366 
209 
294 
311 
296 
312 
204 


380 
398 
185 
117 
730 
382 
474 
255 
431 
373 
312 
27) 
259 
139 
498 
239 
308 
392 
146 
560 
296 
172 
408 
252 
272 
140 
383 
352 
374 

95 
342 
120 
387 
597 
393 


7250 
7744 
3068 
1632 
12710 
7706 
9338 
4730 
8317 
6789 
5907 
5069 
4614 
1987 
10239 
4439 
5621 
7293 
1866 
11240 
5653 
2806 
8042 
7557 
4400 
1739 
7340 
7292 
7325 
1370 
6888 
1425 
7625 
7779 
7885 


Age 


38.5 
39.3 
40.1 
22.4 
57.7 
38.6 
39.0 
3919 
38.9 
38.8 
39.8 
39.7 
38.2 
40.3 
40.0 
39.1 
39.3 
37.9 
40.6 
39.1 
39.8 
40.0 
39:5 
39.8 
40.6 
41.8 
39.0 
37.2 
38.4 
37.4 
37.5 
37.5 
B92 
39.4 
39.2 


DEP 


2.340 
2,355 
2.851 
1.1399 
1,229 
2.602 
2.187 
2.616 
2.024 
2.662 
2.287 
3.193 
2.040 
2.545 
2.064 
2.301 
2.486 
2.042 
3.833 
2.328 
2.208 
2.362 
22259 
2.019 
2.661 
2.444 
2.537 
2.046 
2.847 
4.158 
3.047 
4.512 
2.342 
2.341 
2.341 


School 


10.5 
10.5 
8.9 
11.5 
8.8 
10.7 
lie 
93 
wma 
5 
10.3 
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Table 10.16 U.S. Crime Data for 47 States in 1960 


47 84.9 130 121 90 91 623 1049 


Observation R Age S ED EXo EX, LF M N U; U2 WwW 
1 79.1 151 1 91 58 56 510 950° 33 301 108 4 394 
2 163.5 143 0 m83 103 95 583 1012. 13 102 96 36 557 
3 57.8 142 1 89 45 | 44 533 969 18 219 94 33 318 
4 196.9 136 Oo 121 149 141 577 994 157 80 102 39 673 
5 123.4 141 or 127 109 101 591 985 18 30 91 20 578 
6 68.2 121 o 110 118 115 547 964 25 44 84 29 689 
7 96.3 127 i 11 82 79 519 982 4 139 97 38 620 
8 155.5 131 1 109 115 — 109 542 969 50 179 79 35 472 
9 85.6 157 1 90 65 62 553 955 39 286 81 28 421 
10 70.5 140 o 118 71 68 632 1029 7 15 100 24 526 
11 167.4 124 o 105 121 116 580 966 101 106 77 35 657 
12 84.9 134 0 108 75 71 595 972 47 59 83 31 580 
13 SI 128 Ome 113 67 60 624 972 28 10 77 25 507 
14 66.4 135 Om 117 62 61 595 986 22 46 ZI a 529 
15 79.8 152 1 87 57 53 530 986 30 72 92 43 405 
16 94.6 142 1 88 81 77 497 956 33 321 116 47 427 
17 53.9 143 0 110 66 63 537 977 10 6 114 35 487 
18 92.9 135 1 104 . 123 115 537 978 31 170 89 34 631 
19 75.0 130 0 116 ` 128 128 536 934 51 24 78 34 627 
20 122.5 125 0 108 113 105 567 985 78 94 130 58 626 
21 74.2 126 0 108 74 67 602 984 34 12 102 33 557 
22 43.9 157 1 89 47 44 512 962 22 423 97 34 288 
23 121.6 132 0 96 87 83 564 953 43 92 83 32 513 
24 96.8 131 o 116 78 73 574 1038 7 36 142 42 540 
25 52.3 130 0. 116 63 ae 57 641 984 14 26 70 z 486 
26 199.3 131 0 12 160 143 631 1071 3 77 102 41 674 
27 34.2 .135 0 109 69 71 540 965 6 4 80 22 564 
28 121.6 152 0 112 82 76 571 1018 10 79 103 28 537 
29 104.3 119 0 107 166 157 521 938 168 89 92 36 637 
30 69.6 166 1 89 58 54 521 973 46 254 72 26 396 
31 37.3 140 0 93 55 54 535 1045 6. 200 135 40 453 
32 75.4 125 o 109 90 81 586 964 97 82 105 43 617 
33 107.2 147 1 104 63 64 560 972 23 95 76 24 462 
34 92.3 126 o 118 ` 97 97 542 990 18 21 102 35 589 
35 65.3 123 0 102 97 87 526 948 113 76 124 50 572 
36 Wye 150 0. 100 109 98 531 964 9 24 87 38 559 
37 83.1 IEZA 1 87 58 56 638 974 24 349 76 28 382 
38 56.6 133 0 104 51 47 599 1024 7 40 99 27 425 
39 82.6 149 1 88 61 54 515 953 36 165 86 35 395 
40 115.1 145 1. 104 82 74 560 981 96 126 88 31 -- 488 
41 88.0 148 Une 122 72 66 601 998 9 19 84 20 590 
42 54.2 141 0 109 56 54 523 968 4 2 107 37 . 489 
43 82.3 162 1 99 75 70 522 996 40 208 73 27w 496 
44 103.0 136 o 1z 95 96 574 1012 29 36 111 37 622 
45 45.5 139 1 88 46 41 480 968 19 49 135 53 457 
46 50.8 126 0 104 106 97 599 989 40 24 78 25 593 

0 3 22 113 40 588 


Source: W. Vandaele, “Participation in Illegitimate Activities; Erlich Revisted,” in A. Blumstein, J. Cohen, and D. Nagin, eds., Deterrence and Incapacitation, National 
Academy of Sciences, 1978, pp. 270-335. i i 


Definitions of variables: 
R = crime rate, number of offenses reported to police per million population. 
Age = number of males of age 14-24 per 1,000 population. 
S = indicator variable for southern states (0 = no, 1 = yes). 
ED = mean number of years of schooling times 10 for persons age 25 or older. 
EX, = 1960 per capita expenditure on police by state and local government. 
EX, = 1959 per capita expenditure on police by state and local government. 
LF = labor force participation rate per 1,000 civilian urban males age 14-24. 
M = number of males per 1,000 females. 
N = state population size in hundred thousands. 
NW = number of nonwhites per 1,000 population. 
U; = unemployment rate of urban males per 1,000 of age 14-24. 
U2 = unemployment rate of urban males per 1,000 of age 35-39. 
W = median value of transferable goods and assets or family income in tens of dollars, 
X = the number of families per 1,000 earnings 4 the median income. 
Observation = state (47 states for the year 1960). 


Table 10.17 Updated Longley Data, 1959-2005 


Observation 


1959, 
1960 
1961 
1962 
1963 
1964 
1965 
1966 
1967 
1968 
1969 
1970 
1971 
1972 
ors 
1974 
1975 
1976 
1227 
1978 
1979 
1980 
1981 
1982 
1983 
1984 
1985 
1986 
1987 
1988 
1989 
1990 
1991 
1992 
1993 
1994 
1995 
1996 
1997 
1998 
1293 
2000 
2001 
2002 
2003 
2004 
2005 


Source: Department of Labor, Bureau of Labor Statistics and http://siadapp.dmdc.osd.mil/ personnel/MILITARY/Miltop. htm. 


Y 


64,630 
65,778 
65,746 
66,702 
67,762 
69,305 
71,088 
72,895 
74,372 
75,920 
77,902 
78,678 
79,367 
82,153 
85,064 
86,794 
85,846 
88,752 
92,017 
96,048 
98,824 
99,303 

100,397 
99,526 

100,834 

105,005 

107,150 

109,597 

112,440 

114,968 

117,342 

118,793 

117,718 

118,492 

120,259 

123,060 

124,900 

126,708 

129,558 

131,463 

133,488 

136,891 

136,933 

136,485 

137,736 

139,252 

141,730 


Xı 


82.908 
84.074 
85.015 
86.186 
87.103 
88.438 
90.055 
92.624 
95.491 
99:56 
104.504 
110.046 
115.549 
120.556 
127.307 
138.82 
‘VSie857 
160.68 
170.884 
182.863 
198.077 
216.073 
236.385 
250.798 
260.68 
270.496 
278.759 
284.895 
292.691 
302.68 
314.179 
326.357 
337.747 
345.477 
353.516 
361.026 
368.444 
375.429 
381.663 
385.881 
391.452 
399.986 
409.582 
416.704 
425.553 
437.795 


451.946 
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X2 


509,300 
529,500 
548,200 
589,700 
622,200 
668,500 
724,400 
792,900 
838,000 
916,100 
990,700 
1,044,900 
1,134,700 
1,246,800 
1,395,300 
1,515,500 
1,651,300 
1,842,100 
2,051,200 
2,316,300 
2,595,300 
2,823,700 
3,161,400 
3,291,500 
3,573,800 
3,969,500 
4,246,800 
4,480,600 
4,757,400 
5,127,400 
5,510,600 
5,837,900 
6,026,300 
6,367,400 
6,689, 300 
7,098,400 
7,433,400 
7,851,900 
8,337,300 
8,768,300 
9,302,200 
9,855,900 
10,171,600 
10,500,200 
11,017,600 
11,762,100 
12,502,400 


1 
1 


X3 
3,740 
3,852 
4,714 
3,911 
4,070 
3,786 
3,366 
2,875 
2,975 
2,817 
2,832 
4,093 
5,016 
4,882 
4,365 
5,156 
7,929 
7,406 
6,991 
6,202 
6,137 
7,637 
8,273 
0,678 
0,717 
8,539 
8,312 
8,237 
7,425 
6,701 
6,528 
7,047 
8,628 
9,613 
8,940 
7,996 
7,404 
7,236 
6,739 
6,210 
5,880 
5,692 
6,801 
8,378 
8,774 
8,149 


7,591 


X4 


2552 
2514 
23 
2827 
PBN 
2738 
2722 
3123 
3446 
3538 
3506 
3188 
2816 
2449 
2527 
2229 
2180 
2144 
PATES 
2007 
2088 
2102 
2142 
27 
2199 
2209 
2234 
2244 
2257 
2224 
2208 
267 
2118 
1966 
1760 
1673 
19 
1502 
1457 
1423 
1380 
1405 
1412 
1425 
1423 
1411 


1378 


Xs 


120,287 
121,836 
123,404 
124,864 
127,274 
129,427 
131,541 
133,650 
135,905 
138,171 
140,461 
143,070 
145,826 
148,592 
151,476 
154,378 
157,344 
160,319 
163,377. 
166,422 
169,440 
172,437 
174,929 
177,176 
179,234 
181,192 
183,174 
185,284 
187,419 
189,233 
190,862 
192,644 
194,936 
197,205 
199,622 
201,970 
204,420 
207,087 
209,846 
212,638 
215,404 
218,061 
220,800 
223,532 
226,223 
228,892 


25352 
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10.33. Updated Longley data. We have extended the data given in Section 10.10 to include observations from 
1959-2005. The new data are in Table 10.17. The data pertain to Y = number of people employed, in 
thousands; X; = GNP implicit price deflator; X, = GNP, millions of dollars; X, = number of people 
unemployed in thousands; X, = number of people in the armed forces in thousands; X; = noninsti- 
tutionalized population over 16 years of age; and X; = year, equal to 1 in 1959, 2 in 1960, and 47 in 
2005. 

a. Create scatterplots as suggested in the chapter to assess the relationships between the independent 
variables. Are there any strong relationships? Do they seem linear? 

b. Create a correlation matrix. Which variables seem to be the most related to each other, not including 
the dependent variable? 

c. Run a standard OLS regression to predict the number of people employed in thousands. Do the 
coefficients on the independent variables behave as you would expect? 

d. Based on the above results, do you believe these data suffer from multicollinearity? 

"10.34. As cheese ages, several chemical processes take place that determine the taste of the final product. 
The data given in Table 10.18 pertain to concentrations of various chemicals in a sample of 30 mature 
cheddar cheeses and subjective measures of taste for each sample. The variables acetic and H,S are 
the natural logarithm of concentration of acetic acid and hydrogen sulfide, respectively. The variable 
lactic has not been log-transformed. 

a. Draw a scatterplot of the four variables. 

b. Perform a bivariate regression of taste on acetic and H,S and interpret your results. 

c. Perform a bivariate regression of taste on lactic and H,S, and interpret the results. 

d. Perform a multiple regression of taste on acetic, H,S, and lactic. Interpret your results. 

e. Knowing what you know about multicollinearity, how would you decide among these regressions? 
f What overall conclusions can you draw from your analysis? l 


Table 10.18 Chemicals in Cheeses 


Obs. Taste Acetic H25 Lactic 
1 12.30000 4.543000 3.135000 0.860000 
2 20.90000 5.159000 5.043000 1.530000 
3 39.00000 5.366000 5.438000 1.570000 
4 47.90000 5.759000 7.496000 1.810000 H 
5 5.600000 4.663000 3.807000 0.990000 
6 25.90000 - 5.697000 7.601000 1.090000 
7 37.30000 5.892000 8.726000 . 1.290000 
8 21.90000 6.078000 7.966000 1.780000 
9 18.10000 4.898000 3.850000 1.290000 
10 21.00000 5.242000 4.174000 1.580000 
11 34.90000 5.740000 - 6.142000 1.680000 
12 57.20000 6.446000 7.908000 1.900000 
13 0.700000 4.477000 2.996000 1.060000 
14 25.90000 5.236000 4.942000 1.300000 
15 54.90000 6.151000 6.752000 1.520000 
16 40.90000 3.365000 9.588000 1.740000 
(Contd.) 


“Optional 
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(Contd.) 
17 15.90000 4.787000 3.912000 1.160000 
18 6.400000 5.142000 4.700000 1.490000 
19 18.00000 5.247000 6.174000 1.630000 
20 38.90000 5.438000 9.064000 1.990000 
21 14.00000 4.564000 4.949000 1.150000 
22 15.20000 5.298000 5.220000 1.330000 
23 32.00000 5.455000 9.242000 1.440000 
24 56.70000 5.855000 10.19900 2.010000 
25 16.80000 5.366000 3.664000 1.310000 
26 11.60000 6.043000 3.219000 1.460000 
27 26.50000 6.458000 6.962000 1.720000 
28 0.700000 5.328000 3.912000 1.250000 
29 13.40000 5.802000 6.685000 1.080000 
30 5.500000 6.176000 4.787000 1.250000 


Source: http://lib.stat.cmu.edu/ DASL/Datafiles/Cheese.html 


Key to Multiple Choice Questions 


1. (a) 2. (c) 3. (b) 4. (c) 5. (a) 6. (b) TRC) 8. (a) 9. (d) 
10. (c) 11. (a) 12. (c) S C 14. (b) 15. (d) 16. (a) 17. (d) 18. (d) 
19. (c) 20. (b) 


CHAPTER 


Heteroscedasticity: 
What Happens if the Error 
Variance is Nonconstant? 


An important assumption of the classical linear regression model (Assumption 4) is that the disturbances u; 
appearing in the population regression function are homoscedastic; that is, they all have the same variance. 
In this chapter we examine the validity of this assumption and find out what happens if this assumption is not 
fulfilled. As in Chapter 10, we seek answers to the following questions: 


1. What is the nature of heteroscedasticity? 
2. What are its consequences? 

3. How does one detect it? 

4. What are the remedial measures? 


II.I The Nature of Heteroscedasticity 


As noted in Chapter 3, one of the important assumptions of the classical linear regression model is that the 
variance of each wri L A term u; conditional on the chosen values of the explanatory variables. is some 
constant number equal to a”. This is the assumption of homoscedasticity, or equal (homo) spread (scedas- 
ticity), that is, equal variance. Symbolically, 


B(w?)=c? 12ya)...,n i (11.1.1) 
Diagrammatically, in the two-variable regression model homoscedasticity can be shown as in Figure 3.4, 
which, for convenience, is reproduced as Figure 11.1. As Figure 11.1 shows, the conditional variance of Y, 
(which is equal to that of u;), conditional upon the given X;, remains the same regardless of the values taken 
by the variable X. 
In contrast, consider Figure 11.2, which shows that the conditional variance of Y, increases as X increases. 
Here, the variances of Y; are not the same. Hence, there is heteroscedasticity. Symbolically, 
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Density 


Savings 


A 
Bı + B2X; 


Income X 
Figure 11.1 Homoscedastic disturbances. 


Density 


Savings 


In Come X 


Figure 11.2 Heteroscedastic disturbances. 


E(u?) = oP (11422) 
Notice the subscript of a”, which reminds us that the conditional variances of u, (= conditional variances of 
Y) are no longer constant. 

To make the difference between homoscedasticity and heteroscedasticity clear, assume that in the 
two-variable model Y, = B; + BX; + u; Y represents savings and X represents income. Figures 11.1 and 
11.2 show that as income increases, savings on the average also increase. But in Figure 11.1 the variance of 
savings remains the same at all levels of income, whereas in Figure 11.2 it increases with income. It seems 
that in Figure 11.2 the higher-income families on the average save more than the lower-income families, but 
there is also more variability in their savings. 

There are several reasons why the variances of u; may be variable, some of which are as follows.! 

1. Following the error-learning models, as people learn, their errors of behavior become smaller over time 


or the number of errors becomes more consistent. In this case, a? is expected to decrease. As an example, 


I 
consider Figure 11.3, which relates the number of typing errors made in a given time period on a test to the 
hours put in typing practice. As Figure 11.3 shows, as the number of hours of typing practice increases, the 


average number of typing errors as well as their variances decreases. 


1See Stefan Valavanis, Econometrics, McGraw-Hill, New York, 1959, p. 48. 
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Density 


Ours Of typ; 
n 
& Practicg 


X 
Figure 11.3 Illustration of heteroscedasticity. 


2. As incomes grow, people have more discretionary income’ and hence more scope for choice about the 
disposition of their income. Hence, a; is likely to increase with income. Thus in the regression of savings on 
income one is likely to find o7 increasing with income (as in Figure 11.2) because people have more choices 
about their savings behavior. Similarly, companies with larger profits are generally expected to show greater 
variability in their dividend policies than companies with lower profits. Also, growth-oriented companies are 
likely to show more variability in their dividend payout ratio than established companies. 

3. As data collecting techniques improve, a? is likely to decrease. Thus, banks that have sophisticated 
data processing equipment are likely to commit fewer errors in the monthly or quarterly statements of their 
customers than banks without such facilities. 

4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying observation, or 
outlier, is an observation that is much different (either very small or very large) in relation to the observations 
in the sample. More precisely, an outlier is an observation from a different population to that generating the 
remaining sample observations.’ The inclusion or exclusion of such an observation, especially if the sample 
size is small, can substantially alter the results of regression analysis. 

As an example, consider the scattergram given in Figure 11.4. Based on the data given in Table 11.9 
in Exercise 11.22, this figure plots percent rate of change of stock prices (Y) and consumer prices (X) for 
the post-World War II period through 1969 for 20 countries. In this figure the observation on Y and X for 
Chile can be regarded as an outlier because the given Y and X values are much larger than for the rest of the 
countries. In situations such as this, it would be hard to maintain the assumption of homoscedasticity. In 
Exercise 11.22, you are asked to find out what happens to the regression results if the observations for Chile 
are dropped from the analysis. 

5. Another source of heteroscedasticity arises from violating Assumption 9 of the classical linear regression 
model (CLRM), namely, that the regression model is correctly specified. Although we will discuss the topic 
of specification errors more fully in Chapter 13, very often what looks like heteroscedasticity may be due 
to the fact that some important variables are omitted from the model. Thus, in the demand function for 
a commodity, if we do not include the prices of commodities complementary to or competing with the 
commodity in question (the omitted variable bias), the residuals obtained from the regression may give the 


?As Valavanis puts it, “Income grows, and people now barely discern dollars whereas previously they discerned dimes,” 
ibid., p. 48. 


31 am indebted to Michael McAleer for pointing this out to me. 
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Stock prices (% change) 


1 2 3 AD Se Om See Siew 10 26 
Consumer prices (% change) 


Figure 11.4 The relationship between stock prices and consumer prices. 


distinct impression that the error variance may not be constant. But if the omitted variables are included in 
the model, that impression may disappear. 

As a concrete example, recall our study of advertising impressions retained (Y) in relation to advertising 
expenditure (X). (See Exercise 8.32.) If you regress Y on X only and observe the residuals from this regression, 
you will see one pattern, but if you regress Y on X and X’, you will see another pattern, which can be seen 
clearly from Figure 11.5. We have already seen that X? belongs in the model. (See Exercise 8.32.) 

6. Another source of heteroscedasticity is skewness in the distribution of one or more regressors included 
in the model. Examples are economic variables such as income, wealth, and education. It is well known that 
the distribution of income and wealth in most societies is uneven, with the bulk of the income and wealth 
being owned by a few at the top. 

7. Other sources of heteroscedasticity: As David Hendry notes, heteroscedasticity can also arise because 
of (1) incorrect data transformation (e.g., ratio or first difference transformations) and (2) incorrect functional 
form (e.g., linear versus log-linear models).4 

Note that the problem of heteroscedasticity is likely to be more common in cross- sectional than in time 
series data. In cross-sectional data, one usually deals with members of a population at a given point in 
time, such as individual consumers or their families, firms, industries, or geographical subdivisions such 
as state, country, city, etc. Moreover, these members may be of different sizes, such as small, medium, or 
large firms or low, medium, or high income. In time series data, on the other hand, the variables tend to be of 
similar orders of magnitude because one generally collects the data for the same entity over a period of time. 
Examples are gross domestic product (GDP), consumption expenditure, savings, or employment in India, 
say, for the period 1950-51 to 2009-10. 


4David F. Hendry, Dynamic Econometrics, Oxford University Press, 1995, p. 45. 
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Figure 11.5 Residuals from the regression of (a) impressions of advertising expenditure and (b) impression on Adexp 
and Adexp*. 


As an illustration of heteroscedasticity likely to be encountered in cross-sectional analysis, consider Table 
11.1. This table gives data on average wages per employee and labour productivity in 11 manufacturing 
industry groups (2 digit NIC code) for the year 1998-99. The data is averaged across three states of India 
namely Andhra Pradesh, Bihar and Gujarat. 

Although the industries differ in their output composition, Table 11.1 shows clearly that t = on the average 
firms with higher productivity levels pay higher wages. As an example, firms manufacturing leather products 
paid an average Rs. 39,655, whereas those manufacturing furniture paid Rs. 24,711. But notice that there 
is considerable variability in wages among various states as indicated by the estimated standard devia- 
tions of wages per worker. This can be seen from Figure 11.6, which plots the standard deviation of wages 
and average wages per employee. As can be seen clearly, on average, the standard deviation of wages per 


employee increases with the average value of wages per employee. 


Table 11.1 Wages per employee (Rs.) and Productivity (Rs.) in Manufacturing Industries in India: 1998-99 


Industry Wages per Productivity Standard 
Employee Deviation 

Manufacture of Textiles 31,660.94 $61,506.15 . 2,527.91 

Tanning and dressing of leather; manufacturing of leather 39,654.76  1,027,032.58 ~ 10,082.87 

products 

Manufacturing of wood and wood products except fur- 16,394.52 455,223.97 5,763.08 

niture 

Manufacturing of Paper and paper products Sih 392 7. 687,717.50 17,663.02 

Publishing, printing and reproduction of recorded media 56,247.38 929,562.01 8,756.23 

Manufacturing of non-metallic mineral products 21,316.48 538,554.90 10,311.67 

Manufacturing of Fabricated metal products except ma- 21,566.75 549,645.86 5105287 

chinery and equipment 

Manufacturing of machinery and equipment not includ- 39,175.69 749,620.67 20,937.34 

ed elsewhere 

Manufacturing of electrical machinery and apparatus 47,845.94 935,242.07 38,567.69 

Manufacturing of other transport equipments (other than 53,601.11 785,937.34 22,479.52 

motor vehicles and trailers) 

Manufacturing of furniture 24,711.36 306,195.84 19,807.13 


Source: Annual Survey of Industries, 1998-99, Central Statistical Organization, Government of India. 
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Figure 11.6 Standard Deviation of Wages to Workers and Mean Wages to Workers 


11.2 OLS Estimation in the Presence of Heteroscedasticity 


What happens to ordinary least squares (OLS) estimators and their variances if we introduce heteroscedas- 
ticity by letting E(u?) = g? but retain all other assumptions of the classical model? To answer this question, 
let us revert to the two-variable model: 


Y; = Bi + BoX; + ui 
Applying the usual formula, the OLS estimator of 6, is 


a xi yi 


Bo = yx? 
MXN ~ UX LN (11.2.1) 
n y X? = O X 


but its variance is now given by the following expression (see Appendix 11A, Section 11A.1): 
Loxia 
(Sx?) 


which is obviously different from the usual variance formula obtained under the assumption of homoscedas- 
ticity, namely, 


var (Bo) = (11.2.2) 


rE 
Lx; 


Of course, if 0? = o? for each i, the two formulas will be identical. (Why?) 


var (B2) = (11.2.3) 
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Recall that 6? is best linear unbiased estimator (BLUE) if the assumptions of the classical model, 
including homoscedasticity, hold. Is it still BLUE when we drop only the homoscedasticity assumption and 
replace it with the assumption of heteroscedasticity? It is easy to prove that poi is still linear and unbiased. As a 
matter of fact, as shown in Appendix 3A, Section 3A.2, to establish the unbiasedness of fo it is not necessary 
that the disturbances (u;) be homoscedastic. In fact, the variance of u;, homoscedastic or heteroscedastic, 
plays no part in the determination of the unbiasedness property. Recall that in Appendix 3A, Section 3A.7, 
we showed that A is a consistent estimator under the - assumptions of the classical linear regression model. 
Although we will not prove it, it can be shown that Bo is a consistent estimator despite heteroscedasticity; 
that is, as the sample size increases indefinitely, the estimated B, converges to its true value. Furthermore, it 
can also be shown that under certain conditions (called regularity conditions), Ê is asymptotically normally 
distributed. Of course, what we have said about , also holds true of other parameters of a multiple regression 
model. 

Granted that Bo is still linear unbiased and consistent, is it “efficient” or “best”? That is, does it have 
minimum variance in the class of unbiased estimators? And is that minimum variance given by Eq. (11.2.2)? 
The answer is no to both the questions: ĝ, is no longer best and the minimum variance is not given by 
Eq. (11.2.2). Then what is BLUE in the presence of heteroscedasticity? The answer is given in the following 
section. 


11.3 The Method of Generalized Least Squares (GLS) 


Why is the usual OLS estimator of B, given in Eq. (11.2.1) not best, although it is still unbiased? Intuitively, 
we can see the reason from Table 11.1. As the table shows, there is considerable variability in the wages per 
employee between industry groups. If we were to regress per-employee compensation on productivity, we 
would like to make use of the knowledge that there is considerable interclass variability in earnings. Ideally, 
we would like to devise the estimating scheme in such a manner that observations coming from populations 
with greater variability are given less weight than those coming from populations with smaller variability. 
Examining Table 11.1, we would like to weight observations coming from industry groups such as textiles 
and wood and wood products more heavily than those coming from electrical machinery and apparatus, for 
the former are more closely clustered around their mean values (lower standard deviation) than the latter 
(with higher standard deviation), thereby enabling us to estimate the population regression function (PRF) 
more accurately. 

Unfortunately, the usual OLS method does not follow this strategy and therefore does not make use of the 
“information” contained in the unequal variability of the dependent variable Y, say, wages per employee of 
Table 11.1: It assigns equal weight or importance to each observation. But a method of estimation, known as 
generalized least squares (GLS), takes such information into account explicitly and is therefore capable of 
producing estimators that are BLUE. To see how this is accomplished, let us continue with the now-familiar 
two-variable model: 


Y; = By + BX; + ui . 1341) 


which for ease of algebraic manipulation we write as 


Y; = Bi Xo; + B2Xi + ui (113.2) 


where Xp;= 1 for each i. The reader can see that these two formulations are identical. 


Heteroscedasticity: What Happens if the Error Variance is Nonconstant? 393 


Now assume that the heteroscedastic variances o? are known. Divide Eq. (11.3.2) through by a; to obtain 


Y; Xoi Xi i 
o; =a (2 : ) +e(Ž)+ (=) (11.3.3) 


which for ease of exposition we write as 


Yi" = Pi Xo + By xX; +u (11.3.4) 
where the starred, or transformed, variables are the original variables divided by (the known) a. We use 
the notation ï and £3, the parameters of the transformed model, to distinguish them from the usual OLS 
parameters 6, and f,. 

What is the purpose of transforming the original model? To see this, notice the following feature of the 
transformed error term “;: 


A2 
var (u*) = E(u*} = E G) since E(u*) = 0 
= 1 gf ) since a? is known (11.3.5) 


Oi 
= =) (a?) since E(u?) =o 


O; 


= 


which is a constant. That is, the variance of the transformed disturbance term “; is now homoscedastic. Since 
we are still retaining the other assumptions of the classical model, the finding that it is u% that is homosce- 
dastic suggests that if we apply OLS to the transformed model (11.3.3) it will produce estimators that are 
BLUE. In short, the estimated By and p% are now BLUE and not the OLS estimators By and po. 

This procedure of transforming the original variables in such a way that the transformed variables satisfy 
the assumptions of the classical model and then applying OLS to them is known as the method of generalized 
least squares (GLS). In short, GLS is OLS on the transformed variables that satisfy the standard least-squares 
assumptions. The estimators thus obtained are known as GLS estimators, and it is these estimators that are 
BLUE. 

The actual mechanics of estimating Bj and £5 are as follows. First, we write down the sample regression 


function (SRF) of Eq. (11.3.3) 
Y; Xoi Xi li 
=o (Zt) + C A CS 


Y =p Xy + EX +â; (11.3.6) 


Now, to obtain the GLS estimators, we minimize 


oar = DF — BIXa eY 


OREO wn 


or 


that is, 
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The actual mechanics of minimizing Eq. (11.3.7) follow the standard calculus techniques and are given in 
Appendix 11A, Section 11A.2. As shown there, the GLS estimator of P2 is 


a, _ (Lm)(Lw AM) — (LO wi Xi) (© wi Yi) (11.3.8) 


P= 2 
(Em) (E wX) — (Ewx) 
and its variance is given by 
a Wj 
var (B3) = yi nS (11.3.9) 
(Ew) (E wX) — Co wi Xi) 
where w; = 1/a?. 
Difference between OLS and GLS 
Recall from Chapter 3 that in OLS we minimize 
Ya = YOO; — Êi - Xi)? (11.3.10) 
but in GLS we minimize the expression (11.3.7), which can also be written as 
> wilt? =) w; — Êi Xo — Ê XY (11.3.11) 


where w; = 1/o? (verify that Eq. [11.3.11] and Eq. [11.3.7] are identical). 

Thus, in GLS we minimize a weighted sum of residual squares with Wi = =1/ Op acting as the weights, but 
in OLS we minimize an unweighted or (what amounts to the same thing) equally weighted residual sum of 
squares (RSS). As Eq. (11.3.7) shows, in GLS the weight assigned to each observation is inversely propor- 
tional to its a, that is, observations coming from a population with larger a; will get relatively smaller weight 
and those from a population with smaller ø; will get proportionately larger weight in minimizing the RSS 
(11.3.11). To see the difference between OLS and GLS clearly, consider the hypothetical scattergram given 
in Figure 11.7. 

In the (unweighted) OLS, each ù? associated with points A, B, and C will receive the same weight in 
minimizing the RSS. Obviously, in this case the a? associated with point C will dominate the RSS. But in 
GLS the extreme observation C will get relatively smaller weight than the other two observations. As noted 
earlier, this is the right strategy, for in estimating the population regression function (PRF) more reliably we 
would like to give more weight to observations that are closely clustered around their (population) mean than 
to those that are widely scattered about. 

Since Eq. (11.3.11) minimizes a weighted RSS, it is appropriately known as weighted least squares 
(WLS), and the estimators thus obtained and given in Eqs. (11.3.8) and (11.3.9) are known as WLS 
estimators. But WLS is just a special case of the more general estimating technique, GLS. In the context of 
heteroscedasticity, one can treat the two terms WLS and GLS interchangeably. In later chapters we will come 
across other special cases of GLS. 

In passing, note that if w; = w, a constant for all i, pe is identical with 8, and var ( By is identical with 


the usual (i.e., haniosesdastic) var ( bo) given in Eq. (11.2.3), which should not be surprising. (Why?) (See 
Exercise 11.8.) 
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Figure 11.7 Hypothetical scattergram. 


11.4 Consequences of Using OLS in the Presence of Heteroscedasticity 


As we have seen, both pt and Bo are (linear) unbiased estimators: In repeated sampling. on the average, 
px and Bo will equal the true 8»: that is, they are both unbiased estimators. But we know that it is 83 that is 
efficient, that is, has the smallest variance. What happens to our confidence interval, hypotheses testing, and 
other procedures if we continue to use the OLS estimator £2? We distinguish two cases. 


OLS Estimation Allowing for Heteroscedasticity 


Suppose we use Bo and use the variance formula given in Eq. (11.2.2), which takes into account heterosce- 
dasticity explicitly. Using this variance, and assuming g? are known, can we establish confidence intervals 
and test hypotheses with the usual ¢ and F tests? The answer generally is no because it can be shown that var 
( pt ) < var ( bo), > which means that confidence intervals based on the latter will be unnecessarily larger. As a 
result, the ¢ and F tests are likely to give us inaccurate results in that var (Bo) i is overly large and what appears 
to be a statistically insignificant coefficient (because the f value is smaller than what is appropriate) may in 
fact be significant if the correct confidence intervals were established on the basis of the GLS procedure. 


OLS Estimation Disregarding Heteroscedasticity 


The situation can become serious if we not only use Bo but also continue to use the usual (homoscedastic) 
variance formula given in Eq. (11.2.3) even if heteroscedasticity is present or suspected: Note that this is 
the more likely case of the two we discuss here, because running a standard OLS regression package and 
ignoring (or being ignorant of) heteroscedasticity will yield variance of Bz as given in Eq. (11.2.3). First of 


ŠA formal proof can be found in Phoebus J. Dhrymes, Introductory Econometrics, Springer-Verlag, New York, 1978, pp. 
110-111. In passing, note that the loss of aie of 2 (i.e., by how much var [62] exceeds var [3] depends on the 
sample values of the X variables and the value of o? 
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all, var (Bo) given in Eq. (11.2.3) is a biased estimator of var (Bo) given in Eq. (11.2.2), that is, on the average 
it overestimates or underestimates the latter, and in general we cannot tell whether the bias is positive (overes- 
timation) or negative (underestimation) because it depends on the nature of the relationship between oe and 
the values taken by the explanatory variable X, as can be seen clearly from Eq. (11.2.2) (see Exercise 11.9). 
The bias arises from the fact that 6”, the conventional estimator of o°, namely, 5 a? /(n — 2) is no longer an 
unbiased estimator of the latter when heteroscedasticity is present (see Appendix 11A.3). As a result, we can 
no longer rely on the conventionally computed confidence intervals and the conventionally employed ¢ and F 
tests.° In short, if we persist in using the usual testing procedures despite heteroscedasticity, whatever 
conclusions we draw or inferences we make may be very misleading. 

To throw more light on this topic, we refer toa Monte Carlo study conducted by Davidson and MacKinnon. 
They consider the following simple model, which in our notation is 


Y; = By + 2X; + uj (11.4.1) 


They assume that 6; = 1, 8,= 1, and u; ~ N(0, X¥). As the last expression shows, they assume that the error 
variance is heteroscedastic and is related to the value of the regressor X with power a. If, for example, a = 1, 
the error variance is proportional to the value of X; if æ = 2, the error variance is proportional to the square 
of the value of X, and so on. In Section 11.6 we will consider the logic behind such a procedure. Based on 
20,000 replications and allowing for various values for a, they obtain the standard errors of the two regression 
coefficients using OLS (see Eq. [11.2.3]), OLS allowing for heteroscedasticity (see Eq. [11.2.2]), and GLS 


7 


Standard error of B; Standard error of B2 
Value of a OLS OLShet GLS OLS OLShet GLS 
0.5 0.164 0.134 0.110 0.285 101277: 0.243 
1.0 0.142 0.101 0.048 0.246 0.247 0.173 
2.0 0.116 0.074 0.0073 0.200 0.220 0.109 
3.0 0.100 0.064 0.0013 0.173 0.206 0.056 
4.0 0.089 0.059 0.0003 0.154 0.195 0.017 


Note: OLSpet means OLS allowing for heteroscedasticity. 


The most striking feature of these results is that OLS, with or without correction for heteroscedasticity, 
consistently overestimates the true standard error obtained by the (correct) GLS procedure, especially for 
large values of a, thus establishing the superiority of GLS. These results also show that if we do not use 
GLS and rely on OLS—allowing for or not allowing for heteroscedasticity—the picture is mixed. The usual 
OLS standard errors are either too large (for the intercept) or generally too small (for the slope coefficient) 
in relation to those obtained by OLS allowing for heteroscedasticity. The message is clear: In the presence of 
heteroscedasticity, use GLS. However, for reasons explained later in the chapter, in practice it is not always 
easy to apply GLS. Also, as we discuss later, unless heteroscedasticity is very severe, one may not abandon 
OLS in favor of GLS or WLS. 

From the preceding discussion it is clear that heteroscedasticity is potentially a serious problem and the 
researcher needs to know whether it is present in a given situation. If its presence is detected, then one can 


“From Eq. (5.3.6) we know that the 100(1 - a)% confidence interval for B, is [B2 + ty/2 se (B2)}. But if se (82) cannot be 
estimated unbiasedly, what trust can we put in the conventionally computed confidence interval? 


7Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 
1993, pp. 549-550. í 
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take corrective action, such as using the weighted least-squares regression or some other technique. Before 
we turn to examining the various corrective procedures, however, we must first find out whether heterosce- 
dasticity is present or likely to be present in a given case. This topic is discussed in the following section. 


A Technical Note 


Although we have stated that, in cases of heteroscedasticity, it is the GLS, not the OLS, that is BLUE, there 
are examples where OLS can be BLUE, despite heteroscedasticity.* But such examples are infrequent in 
practice. 


11.5 Detection of Heteroscedasticity 


As with multicollinearity, the important practical question is: How does one know that heteroscedasticity 
is present in a specific situation? Again, as in the case of multicollinearity, there are no hard-and-fast rules 
for detecting heteroscedasticity, only a few rules of thumb. But this situation is inevitable because a7 can be 
known only if we have the entire Y population corresponding to the chosen X’s, such as the population shown 
in Table 2.1. But such data are an exception rather than the rule in most economic investigations. In this 
respect the econometrician differs from scientists in fields such as agriculture and biology, where researchers 
have a good deal of control over their subjects. More often than not, in economic studies there is only one 
sample Y value corresponding to a particular value of X. And there is no way one can know a? from just one 
Y observation. Therefore, in most cases involving econometric investigations, heteroscedasticity may be a 
matter of intuition, educated guesswork, prior empirical experience, or sheer speculation. 

With the preceding caveat in mind, let us examine some of the informal and formal methods of detecting 
heteroscedasticity. As the following discussion will reveal, most of these methods are based on the exami- 
nation of the OLS residuals &; since they are the ones we observe, and not the disturbances u;. One hopes that 
they are good estimates of u;, a hope that may be fulfilled if the sample size is fairly large. 


Informal Methods 


Nature of the Problem 


Very often the nature of the problem under consideration suggests whether heteroscedasticity is likely to be 
encountered. For example, following the pioneering work of Prais and Houthakker on family budget studies, 
where they found that the residual variance around the regression of consumption on income increased with 
income, one now generally assumes that in similar surveys one can expect unequal variances among the 
disturbances.” As a matter of fact, in cross-sectional data involving heterogeneous units, heteroscedasticity 
may be the rule rather than the exception. Thus, in a cross-sectional analysis involving the investment expen- 
diture in relation to sales, rate of interest, etc., heteroscedasticity is generally expected if small-, medium-, 
and large-size firms are sampled together. 


The reason for this is that the Gauss—Markov theorem provides the sufficient (but not necessary) condition for OLS to be ef- 
ficient. The necessary and sufficient condition for OLS to be BLUE is given by Kruskal’s theorem. But this topic is beyond 
the scope of this book. | am indebted to Michael McAleer for bringing this to my attention. For further details, see Denzil 
G. Fiebig, Michael McAleer, and Robert Bartels, “Properties of Ordinary Least Squares Estimators in Regression Models with 
Nonspherical Disturbances,” Journal of Econometrics, vol. 54, No. 1-3, Oct.-Dec., 1992, pp. 321-334. For the mathemati- 
cally inclined student, | discuss this topic further in Appendix C, using matrix algebra. 


9S. J. Prais and H. S. Houthakker, The Analysis of Family Budgets, Cambridge University Press, New York, 1955. 
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As a matter of fact, we have already come across examples of this. In Chapter 2 we discussed the 
relationship between mean, or average, hourly wages in relation to years of schooling in the United States. 
In that chapter we also discussed the relationship between expenditure on food and total expenditure for 55 
families in India (see Exercise 11.16). 


Graphical Method 


If there is no a priori or empirical information about the nature of heteroscedasticity, in practice one can do 
the regression analysis on the assumption that there is no heteroscedasticity and then do a postmortem exami- 
nation of the residual squared i? to see if they exhibit any systematic pattern. Although ú; are not the same 
thing as ur, they can be used as proxies especially if the sample size is sufficiently large.!° An examination 
of the #? may reveal patterns such as those shown in Figure 11.8. 

In Figure 11.8, i? are plotted against Y,, the estimated Y; from the regression line, the idea being to find out 
whether the estimated mean value of Y is systematically related to the squared residual. In Figure 11.8a we 
see that there is no systematic pattern between the two variables, suggesting that perhaps no heteroscedas- 
ticity is present in the data. Figures 11.85 to e, however, exhibit definite patterns. For instance. Figure 11.8c 
suggests a linear relationship, whereas Figures 11.8d and e indicate a quadratic relationship between a? and 
Y;. Using such knowledge, albeit informal, one may transform the data in such a manner that the transformed 
data do not exhibit heteroscedasticity. In Section 11.6 we shall examine several such transformations. 


ue 


(b) (c) 


(d) 
Figure 11.8 Hypothetical patterns of estimated squared residuals. 


asi aie plotting a? against ¥;, one may plot them against one of the explanatory variables, especially if 
plotting “; against Y; results in the pattern shown in Figure 1 1.8a. Such a plot, which is shown in Figure 11.9, 


10For the relationship between û; and u;, see E. Malinvaud, Statistical Methods of Econometrics, North Holland Publishing 
Company, Amsterdam, 1970, pp. 88-89. 
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uz ue A? 


(c) 


(d) : (e) 
Figure 11.9 Scattergram of estimated squared residuals against X. 


may reveal patterns similar to those given in Figure 11.8. (In the case of the two-variable model, plotting ù? 
against Y; is equivalent to plotting it against X,, and therefore Figure 11.9 is similar to Figure 11.8. But this 
is not the situation when we consider a model involving two or more X variables; in this instance, 47 may be 
plotted against any X variable included in the model.) 

A pattern such as that shown in Figure 11.9c, for instance, suggests that the variance of the disturbance 
term is linearly related to the X variable. Thus, if in the regression of savings on income one finds a pattern 
such as that shown in Figure 11.9c, it suggests that the heteroscedastic variance may be proportional to the 
value of the income variable. This knowledge may help us in transforming our data in such a manner that in 
the regression on the transformed data the variance of the disturbance is homoscedastic. We shall return to 


this topic in the next section. 


Formal Methods 


Park Test!' 
Park formalizes the graphical method by suggesting that a? is some function of the explanatory variable X,. 
The functional form he suggests is 


o? = Gaa 


1 


TIR, E. Park, “Estimation with Heteroscedastic Error Terms,” Econometrica, vol. 34, no. 4, October 1966, p. 888. The Park 
test is a special case of the general test proposed by A. C. Harvey in “Estimating Regression Models with Multiplicative 
Heteroscedasticity,” Econometrica, vol. 44, no. 3, 1976, pp. 461-465. 
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or 
Ino? = Ino? + Bin X; + vi (11.5.1) 


where v; is the stochastic disturbance term. 
Since a? is generally not known, Park suggests using ii? as a proxy and running the following regression: 


Inti? = Ino? + Bin X; F Vi ) 
=a+ BInX; +v (11.5.2) 


If B turns out to be statistically significant, it would suggest that heteroscedasticity is present in the data. If 
it turns out to be insignificant, we may accept the assumption of homoscedasticity. The Park test is thus a 
two-stage procedure. In the first stage we run the OLS regression disregarding the heteroscedasticity question. 
We obtain ù; from this regression, and then in the second stage we run the regression (11.5.2). 

Although empirically appealing, the Park test has some problems. Goldfeld and Quandt have argued that 
the error term v; entering into Eq. (11.5.2) may not satisfy the OLS assumptions and may itself be heterosce- 
dastic.!? Nonetheless, as a strictly exploratory method, one may use the Park test. 


Example 11.1 Relationship between Compensation and Productivity 


To illustrate the Park approach, we use the data given in Table 11.1 to run the following regression: 


Y, = Bı + BX; + u; 
where Y, = average wages per worker across industry groups in rupees, X; = average labor productivity across 
industry groups in rupees. The results of the regression are as follows: 


A 


Y,= 1597.00 + 0.05 X; 
se = (8601.49) (0.01) (11.5.3) 
t=(0.19) (4.05) R? = 0.6458 


The results reveal that the estimated slope coefficient is significant at the 5 percent level of significance. The 
equation shows that as labor productivity increases by say, a rupee, labor compensation on average increases 
by about 0.05 rupees. 

The residuals obtained from regression (11.53) are then regressed on X; as suggested in Eq. (11.5.2), giving 
the following results: 


Ind? = 29.25 -0.92 In X; 
se = (22.60)\ (1.69) a (11.5.4) 
t=(1.29) (-0.54) 
Obviously, there is no statistically significant relationship between the two variables. Following the Park 


test, one may conclude that there is no heteroscedasticity in the error variance’? 
eee 


Glejser Test'4 


The Glejser test is similar in spirit to the Park test. After obtaining the residuals 2; from the OLS regression, 
Glejser suggests regressing the absolute values of i; on the X variable that is thought to be closely associated 
with ož. In his experiments, Glejser uses the following functional forms: 


"Stephen M. Goldfeld and Richard E. Quandt, Nonlinear Methods in Econometrics, North Holland Publishing Company, 
Amsterdam, 1972, pp. 93-94. 


13The particular functional form chosen by Park is only suggestive. A different functional form may reveal significant rela- 


tionships. For example, one may use û? as the dependent variable. 
14 Glejser, “A New Test for Heteroscedasticity,” Journal of the American Statistical Association, vol. 64, 1969, pp. 316-323, 
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lůi| = Bi + BX; + vi 
jai] = By + Pov Xi + v; 


tg 1 

|uil = Bi thg + v; 
1 

JX; 

A =V Bi PA FY; 


li| = y Bi + BX? + vi 
where v; is the error term. 


Again as an empirical or practical matter, one may use the Glejser approach. But Goldfeld and Quandt 
point out that the error term v; has some problems in that its expected value is nonzero, it is serially correlated 
(see Chapter 12), and, ironically, it is heteroscedastic.!° An additional difficulty with the Glejser method is 
that models such as 


|u;| = Bi + po + Vv; 


låi| = y fı + 2X; + vi 


li| = y Bi + BX? + vi 


are nonlinear in the parameters and therefore cannot be estimated with the usual OLS procedure. 

Glejser has found that for large samples the first four of the preceding models give generally satisfactory 
results in detecting heteroscedasticity. As a practical matter, therefore, the Glejser technique may be used for 
large samples and may be used in the small samples strictly as a qualitative device to learn something about 
heteroscedasticity. 


and 


Example 11.2 Relationship between Compensation and Productivity: The Glejser Test 


Continuing with Example 11.1, the absolute value of the residuals obtained from regression (11.5.3) was 
regressed on average productivity (X), giving the following results: 
û; = 5092.72 + 0.002 X; 
se = (4436.35) (0.01) R? = 0.0137 (11.5.5) 
t= (1.15) (0.35) 
As you can see from this regression, there is no relationship between the absolute value of the residuals and 
the regressor, average productivity. This reinforces the conclusion based on the Park test. 


Spearman’s Rank Correlation Test 
In Exercise 3.8 we defined the Spearman’s rank correlation coefficient as 


Ze 


ee) (11.5.6) 


r= 1-6 


15For details, see Goldfeld and Quandt, op. cit., Chapter 3. 
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where d,= difference in the ranks assigned to two different characteristics of the ith individual or phenomenon 
and n = number of individuals or phenomena ranked. The preceding rank correlation coefficient can be used 
to detect heteroscedasticity as follows: Assume Y, = By + B,X; + u; 


Step 1. Fit the regression to the data on Y and X and obtain the residuals &;. 

Step 2. Ignoring the sign of ĝ;, that is, taking their absolute value |w;|, rank both \a;|, and X; (or Y;) 
according to an ascending or descending order and compute the Spearman’s rank correlation coefficient 
given previously. 

Step 3. Assuming that the population rank correlation coefficient p, is zero and n > 8, the significance of 
the sample r, can be tested by the f test as follows:!° 


ae aa (11.5.7) 


vIe 
with df =n — 2. 
If the computed rf value exceeds the critical t value, we may accept the hypothesis of heteroscedasticity; 
otherwise we may reject it. If the regression model involves more than one X variable, r, can be computed 


between |i;| and each of the X variables separately and can be tested for statistical significance by the f test 
given in Eq. (11.5.7). 


Example 11.3 Illustration of the Rank Correlation Test 
To illustrate the rank correlation test, consider the data given in Table 11.2. The data pertain to the average 


annual return (E, %) and the standard deviation of annual return (o, %) of 10 mutual funds. 


Table 11.2 Rank Correlation Test of Heteroscedasticity 


E;, oj, nei d, 

Average Standard Difference 

Annual Deviation ja* between 
Name of Return, of Annual A Residuals, Rank Rank Two 
Mutual Fund % Return, % Ef (E; — E)| ofjû| ofo; Rankings d? 
Boston Fund 12.4 12.1 11.37 1.03 9 4 5 25 
Delaware Fund 14.4 21.4 15.64 1.24 10 ne 1 1 
Equity Fund 14.6 18.7 14.40 0.20 4 7 -3 9 
Fundamental Investors 16.0 Zien 15.78 0.22 5 10 —5 25 
Investors Mutual Wiles} WAS 2 11.56 0.26 6 a) 1 1 
Loomis-Sales Mutual Fund 10.0 10.4 10.59 0.59 7 2 5 25 
Massachusetts Investors Trust 16.2 20.8 15.37 0.83 8 8 0 0 
New England Fund 10.4 10.2 10.50 0.10 3 1 2 4 
Putnam Fund of Boston Bi 16.0 13.16 0.06 2 6 —4 16 
Wellington Fund ies 12.0 TSS 0.03 a 3 —2 4 
Total ; ; . z ag 110 


‘Obtained from the regression: Ê; = 5.8194 + 0.4590 oj. 
' Absolute value of the residuals. 
Note: The ranking is in ascending order of values. 


"See G. Udny Yule and M. G. Kendall, An Introduction to the Theory of Statistics, Charles Griffin & Company, London, 1953, 
p. 455. 
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The capital market line (CML) of portfolio theory postulates a linear relationship between expected return 
(E) and risk (as measured by the standard deviation, «) of a portfolio as follows: 
E; = Bj + B20; 


Using the data in Table 11.2, the preceding model was estimated and the residuals from this model were 
computed. Since the data relate to 10 mutual funds of differing sizes and investment goals, a priori one might 
expect heteroscedasticity. To test this hypothesis, we apply the rank correlation test. The necessary calcula- 
tions are given in Table 11.2. 

Applying formula (11.5.6), we obtain 


110 
10(100 — 1) (11.5.8) 
= 03555 
Applying the t test given in Eq. (11.5.7), we obtain 
p — (0-3333)v8) l 
"e (11.5.9) 
= 0.9998 


For 8 df this t value is not significant even at the 10 percent level of significance; the p value is 0.17. Thus, 
there is no evidence of a systematic relationship between the explanatory variable and the absolute values of 
the residuals, which might suggest that there is no heteroscedasticity. 


r=1-—6 


Goldfeld-Quandt Test!” 


This popular method is applicable if one assumes that the heteroscedastic variance, 07, is positively related 
to one of the explanatory variables in the regression model. For simplicity, consider the usual two-variable 
model: 


Y; = pı + BX; + ui 
Suppose G? is positively related to X; as 


o=o X? (11.5.10) 


where g? is a constant.!® 


Assumption (1 1.5.10) postulates that g? is proportional to the square of the X variable. Such an assumption 
has been found quite useful by Prais and Houthakker in their study of family budgets. (See Section 11.5, 
informal methods.) 

If Eq. (11.5.10) is appropriate, it would mean of would be larger, the larger the values of X,. If that turns 
out to be the case, heteroscedasticity is most likely to be present in the model. To test this explicitly, Goldfeld 
and Quandt suggest the following steps: 

Step 1. Order or rank the observations according to the values of X;, beginning with the lowest X value. 

Step 2. Omit c central observations, where c is specified a priori, and divide the remaining (n — c) observa- 

tions into two groups each of (n ~ c)/2 observations. 

Step 3. Fit separate OLS regressions to the first (n — c)/2 observations and the last (n — c)/2 observations, 

and obtain the respective residual sums of squares RSS, and RSS,, RSS, representing the RSS from the 

regression corresponding to the smaller X; values (the small variance group) and RSS, that from the larger 

X; values (the large variance group). These RSS each have 


'7Goldfeld and Quandt, op. cit., Chapter 3. 
18This is only one plausible assumption. Actually, what is required is that o? be monotonically related to X; 
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(n—c) ` a") 
at =C } df 
5 k or 5 


where k is the number of parameters to be estimated, including the intercept. (Why?) For the two-variable 
case k is of course 2. 
Step 4. Compute the ratio 


_ RSS2/df 
~ RSS, /df 


If we assume u; are normally distributed (which we usually do), and if the assumption of homoscedasticity 
is valid, then it can be shown that A of Eq. (11.5.10) follows the F distribution with numerator and denomi- 
nator df each of (n — c — 2k)/2. 


If in an application the computed A ( = F) is greater than the critical F at the chosen level of significance, 
we can reject the hypothesis of homoscedasticity, that is, we can say that heteroscedasticity is very likely. 

Before illustrating the test, a word about omitting the c central observations is in order. These observations 
are omitted to sharpen or accentuate the difference between the small variance group (i.e., RSS,) and the large 
variance group (i.e., RSS,). But the ability of the Goldfeld—Quandt test to do this successfully depends on 
how c is chosen.!? For the two-variable model the Monte Carlo experiments done by Goldfeld and Quandt 
suggest that c is about 8 if the sample size is about 30, and it is about 16 if the sample size is about 60. But 
Judge et al. note that c = 4 if n = 30 and c = 10 if n is about 60 have been found satisfactory in practice.”? 

Before moving on, it maybe noted that in case there is more than one X variable in the model, the ranking 
of observations, the first step in the test, can be done according to any one of them. Thus in the model: Y, = 
B+ BX; + B3X3; + B4X4; + u;, we can rank-order the data according to any one of these X’s. If a priori we 
are not sure which X variable is appropriate, we can conduct the test on each of the X variables, or via a Park 
test, in turn, on each X. 


(11.5.11) 


Example 11.4 The Goldfeld-Quandt Test 


To illustrate the Goldfeld—Quandt test, we present in Table 11.3 data on consumption expenditure in relation 
to income for a cross section of 30 families. Suppose we postulate that consumption expenditure is linearly 
related to income but that heteroscedasticity is present in the data. We further postulate that the nature of 
heteroscedasticity is as given in Eq. (11.5.10). The necessary reordering of the data for the application of the 
test is also presented in Table 11.3. 

Dropping the middle 4 observations, the OLS regressions based on the first 13 and the last 13 observations 
and their associated residual sums of squares are as shown next (standard errors in the parentheses). 

Regression based on the first 13 observations: 


Ý; = 3.4094 + 0.6968X; 
(8.7049) (0.0744) r? = 0.8887 RSS, = 377.17 . df=11 


Technically, the power of the test depends on how c is chosen. In statistics, the power of a test is measured by the prob- 
ability of rejecting the null hypothesis when it is false (i.e., by 1 - Prob [type II error]). Here the null hypothesis is that the 
variances of the two groups are the same, i.e., homoscedasticity. For further discussion, see M. M. Ali and C. Giaccotto, “A 


Study of Several New and Existing Tests for Heteroscedasticity in the General Linear Model,” Journal of Econometrics, vol. 
26, 1984, pp. 355-373. 


°George G. Judge, R. Carter Hill, William E. Griffiths, Helmut Liitkepohl, and Tsoung-Chao Lee, Introduction to the Theory 
and Practice of Econometrics, John Wiley & Sons, New York, 1982, p. 422. 
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Table 11.3 Hypothetical Data on Consumption Expenditure }($) and Income X($) to Illustrate the Goldfeld~Qunadt Test 


144 
7S 
180 
133 
140 
178 
191 
IB7 
189 


Data Ranked by 


X Values 
X Y X 
80 55 80 
100 70 85 
85 75 90 
110 65 100 
120 74 105 
AS 80 110 
130 84 115 
140 79 120 
125 90 125 
90 98 130 
105 95 140 
160 108 145 
150 ll 150 
165 110 160 
145 125 165 | Middle4 ' 
180 115 180 | observations 
225 130 185 
200 135 190 
240 120 200 
185 140 205 
220 144 210 
210 152 220 
245 140 225 
260 137 230 
190 145 240 
205 175 245 
265 189 250 
270 180 260 
230 178 265 
250 191 270 
Regression based on the last 13 observations: 
Ý; = — 28.0272 + 0.7941X; 
(30.6421) (0.1319) r? = 0.7681 RSS2 = 1536.8 


From these results we obtain 


_ RSS2/df _ 1536.8/11 
~ RSS;/df  377.17/11 
= 407 


oo Ul 


The critical F value for 11 numerator and 11 denominator df at the 5 percent level is 2.82. Since the estimated 
F (= A) value exceeds the critical value, we may conclude that there is heteroscedasticity in the error variance. 
However, if the level of significance is fixed at 1 percent, we may not reject the assumption of homoscedas- 
ticity. (Why?) Note that the p value of the observed A is 0.014. 
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Breusch-Pagan-Godfrey Test”! 


The success of the Goldfeld—Quandt test depends not only on the value of c (the number of central observa- 
tions to be omitted) but also on identifying the correct X variable with which to order the observations. This 
limitation of this test can be avoided if we consider the Breusch-Pagan—Godfrey (BPG) test. 

To illustrate this test, consider the k-variable linear regression model 


Y; = Pi + BoXo + --- +BeX tui (11.5.12) 
Assume that the error variance 7 is described as 
o? = f(a) +0222; +++» + OmZmi) (11.5.13) 


that is, 7 is some function of the nonstochastic Z variables; some or all of the X’s can serve as Z’s. Specifi- 
cally, assume that 


o? = a + 0227; +--+ Om Zmi (11.5.14) 
that is, ož is a linear function of the Z’s. If a,=a3=... = a,, = 0, a? = a, which is a constant. Therefore, to 
test whether g7 is homoscedastic, one can test the hypothesis that a, = a=... = a, = 0. This is the basic idea 


behind the Breusch-Pagan—Godfrey test. The actual test procedure is as follows. 


Step 1. Estimate Eq. (11.5.12) by OLS and obtain the residuals %;, #2, ..., ûn. 

Step 2. Obtain o? = > a? /n. Recall from Chapter 4 that this is the maximum likelihood (ML) estimator 
of o°. (Note: The OLS estimator is $ ù? /[n — k].) 

Step 3. Construct variables p, defined as 


pi =ù /6° 
which is simply each residual squared divided by G?. 
Step 4. Regress p; thus constructed on the Z’s as 


Pi = Q1 +0229; + +++ + OmZmi + Vi (11.5.15) 
where v; is the residual term of this regression. 
Step 5. Obtain the ESS (explained sum of squares) from Eq. (11.5.15) and define 


l 
= 5(ESS) (11.5.16) 


w 
Assuming u; are normally distributed, one can show that if there is homoscedasticity and if the sample size 
n increases indefinitely, then 


~~ 2 ‘ 
oe (11.5.17) 


that is, © follows the chi-square distribution with (m — 1) degrees of freedom. (Note: asy means asymp- 

totically.) 

Therefore, if in an application the computed © (= x°) exceeds the critical x’ value at the chosen level of 
significance, one can reject the hypothesis of homoscedasticity; otherwise one does not reject it. 


The reader may wonder why BPG chose ; ESS as the test statistic. The reasoning is slightly involved and 
is left for the references.”” 


21T, Breusch and A. Pagan, “A Simple Test for Heteroscedasticity and Random Coefficient Variation” Econometrica, vol. 47, 
1979, pp. 1287-1294. See also L. Godfrey, “Testing for Multiplicative Heteroscedasticity,” Journal of Econometrics, vol. 8, 
1978, pp. 227-236. Because of similarity, these tests are known as Breusch-Pagan-Godfrey tests of heteroscedasticity. 


22See Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar, Cheltenham, U.K., 1994, pp. 178-179. 
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Example 11.5 The Breusch-Pagan—Godfrey (BPG) Test 


AS an example, let us revisit the data (Table 11.3) that were used to illustrate the Goldfeld—Quandt heterosce- 
dasticity test. Regressing Y on X, we obtain the following: 
Step 1. 
Ý; = 9.2903 + 0.6378X; 


se = (5.2314) (0.0286) RSS = 2361.153 R2? = 0.9466 (11.5.18) 
Step 2. 


6? =) 7/30 = 2361.153/30 = 78.7051 


Step 3. Divide the squared residuals û; obtained from regression (11.5.18) by 78.7051 to construct the 
variable p;. 
Step 4. Assuming that p; are linearly related to X;(= Z) as per Eq. (11.5.14), we obtain the regression 


a 


pi = —0.7426 + 0.0101X; 
se = (0.7529) (0.0041) ESS = 10.4280 R? = 0.18 (11.5.19) 
Step 5. 


1 
@ = 5(ESS) = 5.2140 _ (11.5.20) 


Under the assumptions of the BPG test ®© in Eq. (11.5.20) asymptotically follows the chi-square distribution 
with 1 df. (Note: There is only one regressor in Eq. [11.5.19].) Now from the chi-square table we find that 
for 1 df the 5 percent critical chi-square value is 3.8414 and the 1 percent critical xy? value is 6.6349. Thus, 
the observed chi-square value of 5.2140 is significant at the 5 percent but not the 1 percent level ef signifi- 
cance. Therefore, we reach the same conclusion as the Goldfeld—Quandt test. But keep in mind that, strictly 
speaking, the BPG test is an asymptotic, or large-sample, test and in the present example 30 observations may 
not constitute a large sample. It should also be pointed out that in small samples the test is sensitive to the 
assumption that the disturbances u; are normally distributed. Of course, we can test the normality assumption 
by the tests discussed in Chapter 5.7? 


White’s General Heteroscedasticity Test 


Unlike the Goldfeld—Quandt test, which requires reordering the observations with respect to the X variable 
that supposedly caused heteroscedasticity, or the BPG test, which is sensitive to the normality assumption, the 
general test of heteroscedasticity proposed by White does not rely on the normality assumption and is easy to 
implement.” As an illustration of the basic idea, consider the following three-variable regression model (the 
generalization to the k-variable model is straightforward): 


Y; = By + 2X: + B3X3j + ui 7 (11.5.21) 
The White test proceeds as follows: 


Step 1. Given the data, we estimate Eq. (11.5.21) and obtain the residuals, Uj. 
Step 2. We then run the following (auxiliary) regression: 


ta? = or + 2X2; + 03X3; + 4X4; + 5X3, +X X3 +v (11.5.22)” 


230On this, see R. Koenker, “A Note on Studentizing a Test for Heteroscedasticity,” Journal of Econometrics, vol. 17, 1981, 
pp. 1180-1200. 

24H White, “A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test of Heteroscedasticity,” 
Econometrica, vol. 48, 1980, pp. 817-818. l 

25implied in this procedure is the assumption that the error variance of uj,o7, is functionally related to the regressors, their 
squares, and their cross products. If all the partial slope coefficients in this regression are simultaneously equal to zero, then 
the error variance is the homoscedastic constant equal to a4. 
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That is the squared residuals from the original regression are regressed on the original X variables or 
regressors, their squared values, and the cross product(s) of the regressors. Higher powers of regressors can 
also be introduced. Note that there is a constant term in this equation even though the original regression 
may or may not contain it. Obtain the R? from this (auxiliary) regression. 

Step 3. Under the null hypothesis that there is no heteroscedasticity, it can be shown that sample size (7) 
times the R? obtained from the auxiliary regression asymptotically follows the chi-square distribution with 
df equal to the number of regressors (excluding the constant term) in the auxiliary regression. That is, 


De 2 
n-R dey Xa (11-523) 


where df is as defined previously. In our example, there are 5 df since there are 5 regressors in the auxiliary 
regression. 

Step 4. If the chi-square value obtained in Eq. (11.5.23) exceeds the critical chi-square value at the chosen 
level of significance, the conclusion is that there is heteroscedasticity. If it does not exceed the critical 
chi-square value, there is no heteroscedasticity, which is to say that in the auxiliary regression (11.5.22), 
Q = A3 = A,= Q; = Qg = 0 (see footnote 25). 


Example 11.6 White’s Heteroscedasticity Test 


From cross-sectional data on 41 countries, Stephen Lewis estimated the following regression model:7° 
In Y; = By + Bain X2; + B3 In X3; + ui (11.5.24) 


where Y = ratio of trade taxes (import and export taxes) to total government revenue, X, = ratio of the sum 
of exports plus imports to GNP, and X; = GNP per capita; and In stands for natural log. His hypotheses were 
that Y and X, would be positively related (the higher the trade volume, the higher the trade tax revenue) and 
that Y and X, would be negatively related (as income increases, government finds it is easier to collect direct 
taxes—e.g., income tax—than it is to rely on trade taxes). 

The empirical results supported the hypotheses. For our purpose, the important point is whether there is 
heteroscedasticity in the data. Since the data are cross-sectional involving a heterogeneity of countries, a priori 
one would expect heteroscedasticity in the error variance. By applying White’s heteroscedasticity test to the 
residuals obtained from regression (11.5.24), the following results were obtained:2” 


_ 


a? = —5.8417 + 2.5629 In Trade; + 0.6918 In GNP; l ~ 
—0.4081 (In Trade)? — 0.0491 (In GNP)? (11.5.25) 
+0.0015(In Trade) (In GNP) R? =0.1148 


Note: The standard errors are not given, as they are not pertinent for our purpose here. 

Now n - R? = 41 (0.1148) = 4.7068, which has, asymptotically, a chi-square distribution with 5 df (why?). 
The 5 percent critical chi-square value for 5 df is 11.0705, the 10 percent critical value is 9.2363, and the 25 
percent critical value is 6.62568. For all practical purposes, one can conclude, on the basis of the White test, 
that there is no heteroscedasticity. 
eee 


26Stephen R. Lewis, “Government Revenue from Foreign Trade,” Manchester School of Economics and Social Studies, vol. 31, 
1963, pp. 39-47. 


27These results, with change in notation, are reproduced from William F. Lott and Subhash C. Ray, Applied Econometrics: 
Problems with Data Sets, Instructor's Manual, Chapter 22, pp. 137-140. 
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A comment is in order regarding the White test. If a model has several regressors, then introducing all the 
regressors, their squared (or higher-powered) terms, and their cross products can quickly consume degrees of 
freedom. Therefore, one must use caution in using the test.7® 

In cases where the White test statistic given in Eq. (11.5.25) is statistically significant, heteroscedasticity 
may not necessarily be the cause, but specification errors, about which more will be said in Chapter 13 
(recall point 5 of Section 11.1). In other words, the White test can be a test of ( pure) heteroscedasticity or 
specification error or both. It has been argued that if no cross-product terms are present in the White test 
procedure, then it is a test of pure heteroscedasticity. If cross-product terms are present, then it is a test of both 
heteroscedasticity and specification bias.” 


Other Tests of Heteroscedasticity 


There are several other tests of heteroscedasticity, each based on certain assumptions. The interested reader 
may want to consult the references.*” We mention but one of these tests because of its simplicity. This is the 
Koenker-—Bassett (KB) test. Like the Park, Breusch-Pagan—Godfrey, and White’s tests of heteroscedasticity, 
the KB test is based on the squared residuals, “7, but instead of being regressed on one or more regressors, 
the squared residuals are regressed on the squared estimated values of the regressand. Specifically, if the 
original model is: 


Y; = Bi + BoX2; + 3X3: +--+ + PkXki + ui (11.5.26) 
you estimate this model, obtain ù; from this model, and then estimate 
fi? = a + 0(¥;)* + vi (11.5.27) 


where y, are the estimated values from the model (11.5.26). The null hypothesis is that a, = 0. If this is not 
rejected, then one could conclude that there is no heteroscedasticity. The null hypothesis can be tested by the 
usual f test or the F test. (Note that F, ,= t,’.) If the model (11.5.26) is double log, then the squared residuals 
are regressed on (log Y,)?. One other advantage of the KB test is that it is applicable even if the error term 
in the original model (11.5.26) is not normally distributed. If you apply the KB test to Example 11.1, you 
will find that the slope coefficient in the regression of the squared residuals obtained from Eq. (11.5.3) on 
the estimated e from Eq. (11.5.3) is statistically not different from zero, thus reinforcing the Park test. This 
result should not be surprising since in the present instance we only have a single regressor. But the KB test 
is applicable if there is one regressor or many. 


A Note Regarding the Tests of Heteroscedasticity 


We have discussed several tests of heteroscedasticity in this section. So how do we decide which is the best 
test? This is not an easy question to answer, for these tests are based on various assumptions. In comparing 
the tests, we need to pay attention to their size (or level of significance), power (the probability of rejecting a 
false hypothesis), and sensitivity to outliers. 


28Sometimes the test can be modified to conserve degrees of freedom. See Exercise 11.18. 

29See Richard Harris, Using Cointegration Analysis in Econometrics Modelling, Prentice Hall & Harvester Wheatsheaf, U.K., 
1995, p- 68! 

30See M. J. Harrison and B. P. McCabe, “A Test for Heteroscedasticity Based on Ordinary Least Squares Residuals,” Journal of 
the American Statistical Association, vol. 74, 1979, pp. 494-499; J. Szroeter, “A Class of Parametric Tests for Heteroscedastic- 
ity in Linear Econometric Models,” Econometrica, vol. 46, 1978, pp. 1311-1327; M. A. Evans and M. L. King, “A Further 
Class of Tests for Heteroscedasticity,” Journal of Econometrics, vol. 37, 1988, pp. 265-276; and R. Koenker and G. Bassett, 
“Robust Tests for Heteroscedasticity Based on Regression Quantiles,” Econometrica, vol. 50, 1982, pp. 43-61. 
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We have already pointed out some of the limitations of the popular and easy-to-apply White’s test of 
heteroscedasticity. As a result of these limitations, it may have low power against the alternatives. Besides, 
the test is of little help in identifying the factors or variables that cause heteroscedasticity. 

Similarly, the Breusch-Pagan—Godfrey test is sensitive to the assumption of normality. In contrast, the test 
of Koenker—Bassett does not rely on the normality assumption and may therefore be more powerful. ! In the 
Goldfeld—Quandt test if we omit too many observations, we may diminish the power of the test. 

It is beyond the scope of this text to provide a comparative analysis of the various heteroscedasticity tests. 
But the interested reader may refer to the article by John Lyon and Chin-Ling Tsai to get some idea about the 
strengths and weaknesses of the various tests of heteroscedasticity.* 


11.6 Remedial Measures 


As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency properties of the 
OLS estimators, but they are no longer efficient, not even asymptotically (i.e., large sample size). This lack 
of efficiency makes the usual hypothesis-testing procedure of dubious value. Therefore, remedial measures 
may be called for. There are two approaches to remediation: when g? is known and when o? is not known. 


When g} is Known: The Method of Weighted Least Squares 


As we have seen in Section 11.3, if ¢7 is known, the most straightforward method of correcting heteroscedas- 
ticity is by means of weighted least squares, for the estimators thus obtained are BLUE. 


Example 11.7: Illustration of the Method of Weighted Least Squares 


To illustrate the method, we study the linear production function relationship between output and inputs 
capital and labor for 16 manufacturing industry groups (2-digit NIC codes) for the year 1998-99 for India. 
Similar to Table 11.1, all the variables are averaged across three states of India namely; Andhra Pradesh, Bihar 
and Gujarat. Data is presented in Table 11.4. 

Letting Y represent average output across industry groups, X, the labour employed and X; the fixed capital 
employed in the industry groups, we run the following regression (see Eq. [11.3.6]): 


¥,/o; = Bi (1/0) + B3(x3;/0; + Oc. /0;) + (U;/0;)) 


where g; are the standard deviations of output, given in Table 11.4. 


Table 11.4 Output (in Rs Lakh), Capital (in Rs. Lakh) and Labour Employed (in numbers) for Manufac- 
turing Industry groups in India—1998-99 


(11.6.1) 


~w 


Manufacturing Industry Groups Output Labour Capital Standard 
Deviation 
Food products and beverages 979,316 65,973 . 76,684 743,464 
Tobacco products 118,474 110,594 129,288 97,772 
Textile ) 527,984 63,128 436,342 698,109 
Tanning and dressing of Leather and leather 15,294 1T 523 14,150 6,065 
products 
a M ee eee eee 
(Contd.) 


31For details, see William H. Green, Econometric Analysis, 6th ed., Pearson/Prentice-Hall, New Jersey, 2008, pp. 165-167. 
32See their article, “A Comparison of Tests of Heteroscedasticity,” The Statistician, vol. 45, no. 3, 1996, pp. 337-349. 
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(Contd.) 

Wood and wood products except furniture 8,112 1,780 1,389 5,626 
Paper and paper products : 68,406 7,628 29,208 60,972 
Publishing, printing and reproduction of assole 30,534 3,382 7,422 22,004 
media 

Chemicals and chemical products 1,666,160 49,497 982,036 2,300,124 
Rubber and plastic products. 216,643 10,312 76,190 244,388 
Other non-metallic mineral products 204,436 3298 120,660 164,408 
Basic metals 465,908 20,890 601,234 408,409 
Fabricated metal products except machinery and 63,510 bo sys 86,013 66,515 
equipment 

Machinery and equipment 197,466 19,509 61,683 217,605 
Electrical machinery and apparatus 141,994 10,771 26,920 125,666 
Transport equipment other than motor vehicles 75,524 6,972 37,368 99,542 
and trailers 
Furniture 16,018 5,904 7,976 22,342 


Source: Annual Survey of Industries, 1998-99, Central Statistical Organization, Government of India. 


Before going on to the regression results, note that Eq. (11.6.1) has no intercept term (why?). Therefore, 
one will have to use the regression-through-the-origin model to estimate Bis ih and Bs, a topic discussed 
in Chapter 6. Also note another interesting feature of Eq. (11.6.1): It has three explanatory variables, (1/0) , 
(X,2i/o, i), and (X;3i/a, i), whereas if we were to use OLS, regressing output on labor and capital, that regression 
would have two explanatory variables X, and X, (why?). 

The regression results of WLS are as follows: 


(Y,/0;) = 5785.63(1/0) + 0.46(X,;/0,) + 0.81(X;;/ 0) 


se =(2959.98) (0.62) (0.24) l (11.6.2) 
t= (1.95) (0.74) (3.35) 
R? = 0.685673 


For comparison, we give the usual or unweighted OLS regression results: 
Y, = 22609.29 + 2.66 Xz; + 1.23 X3; 


se = (89409.47) (2.32) (0.26) (11.6.3) 
t = (0.25) (1.15) (4.68) 
R? = 0.6583 


In Exercise 11.7 you are asked to compare these two regressions. 


When g} is not Known 


As noted earlier, if true o? are known, we can use the WLS method to obtain BLUE estimators. Since the true 
o? are rarely known, is there a way of obtaining consistent (in the statistical sense) estimates of the variances 
and covariances of OLS estimators even if there is heteroscedasticity? The answer is yes. 


33As noted in footnote 3 of Chapter 6, the R? of the regression through the origin is not directly comparable with the R? of 
the intercept-present model. The reported adjusted R? takes this into account. 
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White’s Heteroscedasticity-Consistent Variances and Standard Errors 


White has shown that this estimate can be performed so that asymptotically valid (i.e., large-sample) statis- 
tical inferences can be made about the true parameter values.** We will not present the mathematical details, 
for they are beyond the scope of this book. However, Appendix 11A.4 outlines White’s procedure. Nowadays, 
several computer packages present White’s heteroscedasticity-corrected variances and standard errors along 
with the usual OLS variances and standard errors.” Incidentally, White’s heteroscedasticity-corrected 
standard errors are also known as robust standard errors. 


Example 11.8 _ [llustration-of White’s Procedure 


As an example, we quote the following results due to Greene:°° 


Y% = 832.91 — 1834.2 (Income) + 1587.04 (Income)? 


OLS se = (327.3) (829.0) (519.1) 
t= (254) (221) ` (8.06) (11.6.4) 
White se = (460.9) (1243.0) (830.0) 
c= lsh) (—1.48) (1.91) 


where Y = per capita expenditure on public schools by state in 1979 and Income = per capita income by state 
in 1979. The sample consisted of 50 states plus Washington, DC. 


As the preceding results show, (White’s) heteroscedasticity-corrected standard errors are considerably 
larger than the OLS standard errors and therefore the estimated t values are much smaller than those obtained 
by OLS. On the basis of the latter, both the regressors are statistically significant at the 5 percent level, whereas 
on the basis of White estimators they are not. However, it should be pointed out that White’s heteroscedas- 
ticity-corrected standard errors can be larger or smaller than the uncorrected standard errors. 

Since White’s heteroscedasticity-consistent estimators of the variances are now available in established 
regression packages, it is recommended that the reader report them. As Wallace and Silver note: 

Generally speaking, it is probably a good idea to use the WHITE option [available in regression programs] 


routinely, perhaps comparing the output with regular OLS output as a check to see whether heteroscedasticity is a 
serious problem in a particular set of data.*” 


w 


Plausible Assumptions about Heteroscedasticity Pattern 


Apart from being a large-sample procedure, one drawback of the White procedure is that the estimators thus 
obtained may not be so efficient as those obtained by methods that transform data to reflect specific types of 
heteroscedasticity. To illustrate this, let us revert to the two-variable regression model: 


Y; = Bi + BX; + ui 


We now consider several assumptions about the pattern of heteroscedasticity. 


34See H. White, op. cit. 

35More technically, they are known as heteroscedasticity-consistent covariance matrix estimators. 
36William H. Greene, Econometric Analysis, 2d ed., Macmillan, New York, 1993, p. 385. 

7 Dudley Wallace and }. Lew Silver, Econometrics: An Introduction, Addison-Wesley, Reading, Mass., 1988, p. 265. 
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Assumption 1 The error variance is proportional to X?: 
ENU ERAN (11.6.5) 
If, as a matter of “speculation,” graphical methods, or Park and Glejser approaches, it is believed that 


the variance of u, is proportional to the square of the explanatory variable X (see Figure 11.10), one may 
transform the original model as follows. Divide the original model through by X; 


Y Bi ui 
ra ay, 
i (11.6.6) 
= B, X + Bo +; 


where vis the transformed disturbance term, equal to u,/X,. Now it is easy to verify that 


2 
E(v,)=E Ga = yee (u;) 
=ø using (11.6.5) 


Hence the variance of v; is now homoscedastic, and one may proceed to apply OLS to the transformed 
equation (11.6.6), regressing Y,/X; on 1/X;. 


Figure 11.10 Error variance proportional to xW: 


Notice that in the transformed regression the intercept term ß, is the slope coefficient in the original 
equation and the slope coefficient £} is the intercept term in the original model. Therefore, to get back to the 
original model we shall have to multiply the estimated Eq. (11.6.6) by X;. An application of this transfor- 
mation is given in Exercise 11.20. 


38Recall that we have already encountered this assumption in our discussion of the Goldfeld—Quandt test. 
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Assumption 2 The error variance is proportional to X,. The square root 
transformation: 


E(u?) = 02x; (11.6.7) 


If it is believed that the variance of u;, instead of being proportional to the squared X;, is proportional to X; 
itself, then the original model can be transformed as follows (see Figure 11.11): 


Yi Bi TRU 

Te = mt PVM + He 
(11.6.8) 

= Bite + Bay Xi + vi 


where v; = u;/./X; and where X; > 0. 


Figure 11.11 Error variance proportional to X. 


Given assumption 2, one can readily verify that E(v?) = 0”, a homoscedastic situation. Therefore, one may 
proceed to apply OLS to Eq. (11.6.8), regressing ¥;/./X; on 1/./X; and /X,. 

Note an important feature of the transformed model: It has no intercept term. Therefore, one will have to 
use the regression-through-the-origin model to estimate 8, and 8. Having run Eq. (11.6.8), one can get back 
to the original model simply by multiplying Eq. (11.6.8) by /X;. l 

An interesting case is the zero intercept model, namely, Y, = PB2X; + u;. In this case, Eq. (11.6.8) becomes: 

Y; uj 
Tx = BRE Tx (11.6.8a) 


And it can be shown that 


Y 
p= = (11.6.8b) 
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That is the weighted least-squares estimator is simply the ratio of the means of the dependent and 
explanatory variables. (To prove Eq. [11.6.8b], just apply the regression-through-the-origin formula given in 
Eq. [6.1.6].) 


NL ed 
Assumption 3 The error variance is proportional to the square of the mean value 
of Y. 
E (u?) = o?[E (Y) (11.6.9) 
Sn T 


Equation (11.6.9) postulates that the variance of u; is proportional to the square of the expected value of Y 
(see Figure 11.8e). Now 


E(Y;) = By + BX; 
Therefore, if we transform the original equation as follows, 


Yt oe. Pi Xi uj 
EY) ZO | Ea) * ED 


: 1 Xi 
= ßĝı (=m) EPAR 


(11.6.10) 


where v, = u;/E(Y;), it can be seen that E(v?) = 0°; that is, the disturbances v; are homoscedastic. Hence, it 
is regression (11.6.10) that will satisfy the homoscedasticity assumption of the classical linear regression 
model. 

The transformation (11.6.10) is, however, inoperational because E(Y,) depends on B, and 85, which are 
unknown. Of course, we know F; = Â} + )X;, which is an estimator of E(Y,). Therefore, we may proceed in 
two steps: First, we run the usual OLS regression, disregarding the heteroscedasticity problem, and obtain Yj. 
Then, using the estimated ie we transform our model as follows: 


Y; 1 Xi 
fi (z) + Bo @ + Vj 11.6.11 
Yi P : ae 
where v; = (u;/Y;). In Step 2, we run the regression (11.6.11). Although Y, are not exactly E(Y,), they are 
consistent estimators; that is, as the sample size increases indefinitely, they converge to true E(Y,). Hence, the 
transformation (11.6.11) will perform satisfactorily in practice if the sample size is reasonably large. 


Assumption 4 A log transformation such as 


In Y; = By + B2InxX; +u; (11.6.12) 


very often reduces heteroscedasticity when compared with the regression Y; = B, + BX; + u;. 


This result arises because log transformation compresses the scales in which the variables are measured, 
thereby reducing a tenfold difference between two values to a twofold difference. Thus, the number 80 is 10 
times the number 8, but In 80 (= 4.3280) is about twice as large as In 8 (= 2.0794). 

An additional advantage of the log transformation is that the slope coefficient B, measures the elasticity 
of Y with respect to X, that is, the percentage change in Y for a percentage change in X. For example, if Y 
is consumption and X is income, , in Eq. (11.6.12) will measure income elasticity, whereas in the original 
model 8, measures only the rate of change of mean consumption for a unit change in income. It is one reason 
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why the log models are quite popular in empirical econometrics. (For some of the problems associated with 
. log transformation, see Exercise 11.4.) 

To conclude our discussion of the remedial measures, we reemphasize that all the transformations discussed 
previously are ad hoc; we are essentially speculating about the nature of a}. Which of the transformations 
discussed previously will work will depend on the nature of the problem and the severity of heteroscedas- 
ticity. There are some additional problems with the transformations we have considered that should be borne 
in mind: 

1. When we go beyond the two-variable model, we may not know a priori which of the X variables should 
be chosen for transforming the data.*” 

2. Log transformation as discussed in Assumption 4 is not applicable if some of the Y and X values are 
zero or negative.“ 

3. Then there is the problem of spurious correlation. This term, due to Karl Pearson, refers to the situation 
where correlation is found to be present between the ratios of variables even though the original variables 
are uncorrelated or random.*! Thus, in the model Y,= 8, + B.X; + u;, Y and X may not be correlated but in the 
transformed model Y,/X; = B,(1/X;) + B», Y;/X; and 1/X; are often found to be correlated. 

4. When o? are not directly known and are estimated from one or more of the transformations that we have 
discussed earlier, all our testing procedures using the t tests, F tests, etc., are, strictly speaking, valid only in 
large samples. Therefore, one has to be careful in interpreting the results based on the various transformations 
in small or finite samples.*” 


11.7 Concluding Examples 


In concluding our discussion of heteroscedasticity we present three examples illustrating the main points 
made in this chapter. 


Example 11.9 Child Mortality Revisited 


Let us return to the child mortality example we have considered on several occasions. From data for 64 
countries, we obtained the regression results shown in Eq. (8.1.4). Since the data are cross-sectional, involving 
diverse countries with different child mortality experiences, it is likely that we might encounter heteroscedas- 
ticity. To find this out, let us first consider the residuals obtained from Eq. (8.1.4). These residuals are plotted in 
Figure 11.12. From this figure it seems that the residuals do not show any distinct pattern tat might suggest 
heteroscedasticity. Nonetheless, appearances can be deceptive. So, let us apply the Park, Glejser, and White 
tests to see if there is any evidence of heteroscedasticity. 


Park Test. Since there are two regressors, GNP and FLR, we can regress the squared residuals from regression 
(8.1 .4) on either of these variables. Or, we can regress them on the estimated CM values (= CM) from 
regression (8.1.4). Using the latter, we obtained the following results. 


?However, as a practical matter, one may plot a? against each variable and decide which X variable may be used for trans- 
forming the data. (See Fig. 11.9.) 


4°sometimes we can use In (Y; + k) or In (X; + k), where k is a positive number chosen in such a way that all the values of Y 
and X become positive. 


“'For example, if X4, X2, and X, are mutually uncorrelated r,. = f3 = 23 = 0 and we find that the (values of the) ratios X,/X3 
and X,/X; are correlated, then there is spurious correlation. “More generally, correlation may be described as spurious if it is 
induced by the method of handling the data and is not present in the original material.” M. G. Kendall and W. R. Buckland, 
A Dictionary of Statistical Terms, Hafner Publishing, New York, 1972, p. 143. 


“For further details, see George G. Judge et al., op. cit., Section 14.4, pp. 415-420. 
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a? = 854.4006 + 5.7016 CM, 
t= (1.2010) (1.2428) r? = 0.024 (11.7.1) 
Note: û, are the residuals obtained from regression (8.1.4) and CM are the estimated values of CM from 
regression (8.1.4). 
As this regression shows, there is no systematic relation between the squared residuals and the estimated 


CM values (why?), suggesting that the assumption of homoscedasticity may be valid. Incidentally, regressing 
the log of the squared residual values on the log of CM did not change the conclusion. 


Glejser Test. The absolute values of the residual obtained from Eq. (8.1.4), when regressed on the estimated 
CM value from the same regression, gave the following results: 
jâ; = 22.3127 + 0.0646 CM, 

t= (2.8086) (1.2622) r? = 0.0250 (11.7.2) 
Again, there is not much systematic relationship between the absolute values of the residuals and the estimated 
CM values, as the t value of the slope coefficient is not statistically significant. 
White Test. Applying White’s heteroscedasticity test with and without cross-product terms, we did not 
find any evidence of heteroscedasticity. We also reestimated Eq. (8.1.4) to obtain White’s heteroscedasticity- 
consistent standard errors and t values, but the results were quite similar to those given in Eq. (8.1.4), which 
should not be surprising in view of the various heteroscedasticity tests we conducted earlier. 

In sum, it seems that our child mortality regression (8.1.4) does not suffer from heteroscedasticity. 


100 
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Figure 11.12 Residuals from regression (8.1.4). 


Example 11.10 R&D Expenditure, Sale and Profits in 16 Industry Groupings in India, 2009-10 


Table 11.5 gives data on research and development (R&D) expenditure, sales and profits for 16 industry 
grouping (as given in Prowess) in India (all figures in Rs. crore). Since the cross-sectional data presented in 
this table are quite heterogeneous, in a regression of R&D on sales, heteroscedasticity is likely. The regression 
results are as follows: 


R&D, =-20.179 + 0.003 Sales, 
se = (298.562) (0.002) (11.7.3) 


t = (-0.068) (2.040)  R?=0.2291 


Not surprisingly, there is positive relationship between R&D and sales, statistically significant at 6 percent level 
of significance (P = 0.061). 
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Table 11.5 Research and Development, Sales and Profit Data for Indian Industry Groups, 2009-10, Measured in 


Rs Crore 
a, Sn ee 
Industry Group R&D Expenses Sales Profit after tax 
Electricity 64.54 288,651.52 17,260.19 
Food and beverages 361.07 232,186.13 12,281.28 
Industrial & infrastructural construction 109.84 177,101.66 13,430.96 
Inorganic chemicals 8.50 399505 151510 
Leather products 4.47 4,998.01 233.72 
Machinery l 2,104.80 258,916.47 15,096.22 
Metals & metal products 268.75 376,770.90 30,255.87 
Mining 384.08 161,672.57 35,436.87 
Misc. manufactured articles 46.82 7,914.46 2NS) 
Non-metallic mineral products > = `` 95.99 166,542.44 13,062.95 
Organic chemicals 160.38 20,106.90 1,409.55 
Paper, newsprint and paper products 21.44 18,056.43 496.25 
Plastic products 60.29 31,892.09 2,205.88 
Rubber and rubber products , 30.68 -3,764.01 859.99 
Textiles 77.76 128,021.16 9,199.26 
Transport equipment i 3,120.45 275,753.44 18,201.64 


Source: Prowess, Center for Monitoring Indian Economy, Mumbai. 


To see if the regression (11.7.3) suffers from heteroscedasticity, we obtained the residuals, d;, and the 
squared residuals, a?, from the model and plotted them against sales, as shown in Figure 11.13. It seems from 
this figure that there is a systematic pattern between the residuals and squared residuals and sales, perhaps 
suggesting that there is heteroscedasticity. To test this formally, we used the Park, Glejser and White tests, 
which gave the following results: 


Park Test 


û? = -167371.53 + 5.39 Sales, 
se = (403943.17) (2.23) (11.7.4) 
t=(-0.41) (2.42) R = 0.2944 z 
The park test suggests that there is a statistically significant positive relationship between squared residuals 
and sales. 


Glejser Test 
|u| =-45.417 + 0.004 Sales, 
se = (148.819) (0.001) (11.7.5) 
t = (-0.305) (4.575) R? = 0.5992 


The Glejser test also suggests that there is a systematic relationship between the absolute values of the residuals 
and sales, raising the possibility that the regression (11.7.3) suffers from heteroscedasticity. 


White Test 
ii? = ~128442.58 + 4.11 Sales; +0.000004 Sales? 


se = (472903.39) (7.59) (0.000023) (11.7.6) 
t = (-0.27) (0.54) (0.18) R? = 0.2961 
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Figure 11.13: Residuals (a) and Squared Residuals (4) on Sales 


Using the R? value and n = 16, we obtain nR? = 4.74. Under the null hypothesis of no heteroscedasticity, this 
should follow a chi-square distribution with 2 df (because there are two regressors in Eq. [11.7.6]). From the 
chi-square table, the 10 percent critical value is 4.60517, indicating that there is heteroscedasticity. 

In sum, then, on the basis of the residual graphs and the Park, Glejser, and White tests it seems that 
our R&D regression (11.7.3) suffers from heteroscedasticity. Since the true error variance is unknown, we 
cannot use the method of weighted least squares to obtain heteroscedasticity-corrected standard errors and 
t values. Therefore, we would have to make some educated guesses about the nature of the error variance. 
White’s heteroscedasticity-consistent standard errors procedure could be used but remember that the White 


procedure is strictly a large-sample procedure, whereas we have only 16 observations. 
pm ee SS O OS 
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Example 11.11 


Table 11.6 on the textbook website provides salary and related data on 94 school districts in Northwest Ohio. 
Initially, the following regression was estimated from these data: 


In(Salary); = 61 + £2 In(Famincome) + £3 In(Propvalue) + ui 


Where Salary = mean salary of classroom teachers ($), famincome = mean family income in the district ($), 
and propvalue = mean property value in the district ($). 

Since this is a double-log model, all the slope coefficients are elasticities. Based on the various heteroscedas- 
ticity tests discussed in the text, it was found that the preceding model suffered from heteroscedasticity. We, 
therefore, obtained (White’s) robust standard errors. The following table gives the results of the preceding 
regression with and without robust standard errors. 


Variable 


OLS se Robust se 


Intercept 7.0198 0.8053 0.7721 
(8.7171) (9.0908) 
In(famincome) 0.2575 0.0799 0.1009 
(3.2230) (2.5516) 
In(propvalue) 0.0704 0.0207 0.0460 
(3.3976) (15311) 
R2 0.2198 


Note: Figures in parentheses are the estimated ¢ ratios. 


Although the coefficient values and R? remain the same whether we use OLS or White’s method, the standard 
errors have changed; the most dramatic change is in the standard error of the In(propvalue) coefficient. The 
usual OLS would suggest that the estimated coefficient of this variable is highly statistically significant, whereas 
White’s robust standard error suggests that this coefficient is not significant even at the 10 percent level. The 
point of this example is that if there is heteroscedasticity, we should take it into account in estimating a model. 


11.8 A Caution about Overreacting to Heteroscedasticity 


Reverting to the R&D example discussed in the previous section, we saw that the regression data showed 

evidence of presence of heteroscedasticity. Is this problem so significant that one should worry about it in 

practice? To put the matter differently, when should we really worry about the heteroscedasticity problem? As 

one author contends, “heteroscedasticity has never been a reason to throw out an otherwise good model.” 
Here it may be useful to bear in mind the caution sounded by John Fox: 


..-unequal error variance is worth correcting only when the problem is severe. 

The impact of nonconstant error variance on the efficiency of ordinary least-squares estimator and on the validity 
of least-squares inference depends on several factors, including the sample size, the degree of variation in the ož, 
the configuration of the X [i.e., regressor] values, and the relationship between the error variance and the X’s. It 


is therefore not possible to develop wholly general conclusions concerning the harm produced by heteroscedas- 
ticity.“ 


43N. Gregory Mankiw, “A Quick Refresher Course in Macroeconomics,” Journal of Economic Literature, vol. XXVIII, December 
1990, p. 1648. | . 


“John Fox, Applied Regression Analysis, Linear Models, and Related Methods, Sage Publications, California, 1997, p. 306. 
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Returning to the model (11.3.1), we saw earlier that variance of the slope estimator, var (B2), is given by 
the usual formula shown in (11.2.3). Under GLS the variance of the slope estimator, var ( E ), is given by 
(11.3.9). We know that the latter is more efficient than the former. But how large does the former (i.e., OLS) 
variance have to be in relation to the GLS variance before one should really worry about it? As a rule of 
thumb, Fox suggests that we worry about this problem *...when the largest error variance is more than about 
10 times the smallest.”* Thus, returning to the Monte Carlo simulations results of Davidson and MacKinnon 
presented in Section 11.4, consider the value of a = 2. The variance of the estimated Bis 0.04 under OLS 
and 0.012 under GLS, the ratio of the former to the latter thus being about 3.33.46 According to the Fox rule, 
the severity of heteroscedasticity in this case may not be large enough to worry about. 

Also remember that, despite heteroscedasticity, OLS estimators are linear unbiased and are (under general 
conditions) asymptotically (i.e., in large samples) normally distributed. 

As we will see when we discuss other violations of the assumptions of the classical linear regression 
model, the caution sounded in this section is appropriate as a general rule. Otherwise, one can go overboard. 


Summary and Conclusions 


1. A critical assumption of the classical linear regression model is that the disturbances u; have all the 

same variance, g”. If this assumption is not satisfied, there is heteroscedasticity. 

Heteroscedasticity does not destroy the unbiasedness and consistency properties of OLS estimators. 

But these estimators are no longer minimum variance or efficient. That is, they are not BLUE. 

4. The BLUE estimators are provided by the method of weighted least squares, provided the heterosce- 
dastic error variances, 07, are known. 

5. In the presence of heteroscedasticity, the variances of OLS estimators are not provided by the usual 
OLS formulas. But if we persist in using the usual OLS formulas, the t and F tests based on them can 
be highly misleading, resulting in erroneous conclusions. 

6. Documenting the consequences of heteroscedasticity is easier than detecting it. There are several 
diagnostic tests available, but one cannot tell for sure which will work in a given situation. 

7. Even if heteroscedasticity is suspected and detected, it is not easy to correct the problem. If the sample 
is large, one can obtain White’s heteroscedasticity-corrected standard errors of OLS estimators and 
conduct statistical inference based on these standard errors. 

8. Otherwise, on the basis of OLS residuals, one can make educated guesses of the likely pattern of 
heteroscedasticity and transform the original data in such a way that in the transformed data there is no 
heteroscedasticity. 


ot 


Multiple Choice Questions 


1. Heteroscedasticity means that 
a. All X variables cannot be assumed to be homogeneous 
b. The variance of the error term is not constant 
c. The observed units have no relation 
d. The X and Y are not correlated 


S\bid., p. 307. 
46Note that we have squared the standard errors to obtain the variances. 
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Heteroscedasticity may result due to the presence of 
a. Outliers in the sample 
b. Omission of important explanatory variable in the model 
c. Skewness in the distribution of regressors in the model 
d. All of the above 
Heteroscedasticity is more likely a problem of 
a. Cross-section data 
b. Time series data 
c. Pooled data 
d. All of the above 


. The coefficient estimated in the presence of heteroscedasticity are NOT 


a. Unbiased estimators 

b. Consistent estimators 

c. Efficient estimators 

d. Linear estimators 
Estimating the regression model in the presence of heteroscedasticity using this method leads to BLUE 
estimators 

a. OLS 

b. GLS 

c. MLE 

d. Two-stage regression estimation 
The estimation method that accounts for the fact that observations coming from population with greater 
variability are given less weight than those coming from population with smaller variability is 

a. OLS 

b. GLS 

c. MLE 

d. Two-stage regression estimation 


. In the regression model Y, = B, Xo; + 8X; + u; if B, is the intercept coefficient then the values that Xp; 


can take are 
a. All ones 
b. All zeros 
c. Any value —— 
d. Any positive value 


. Using OLS estimation technique in the presence of heteroscedasticity will lead to 


a. Easy acceptance of statistically significant coefficient using t and F test 
b. Easy rejection of statistically significant coefficient using r and F test 
c. The tand F test still being accurate 
d. t test gives accurate results while F test does not 
Under Park test Ing? = in g? + B In X; + v; is the suggested regression model. Here if we find £ to be 
statistically significantly different from zero, this means that 
a. Homoscedasticity assumption is satisfied 
b. Homoscedasticity assumption is not satisfied 
c. We need further testing 
d. X; has impact on Y; 
According to Goldfeld and Quandt the problem with Park test is that the 
a. Error term is heteroscedastic 


11. 


12. 
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14. 
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b. Expected value of v; is nonzero 
c. v;is serially correlated 
d. Model is nonlinear in parameter 
Under the spearman’s rank correlation test for heteroscedasticity, the null hypothesis tested is that 
a. There is no heteroscedasticity in the sample data 
b. There is heteroscedasticity in the sample data 
c. There is positive heteroscedasticity in the sample data 
d. There is negative heteroscedasticity in the sample data 
The heteroscedasticity test that is sensitive to the normality assumption of error term is 
a. Goldfield-Quandt test 
b. Breuseh-Pagan-Godfrey test 
c. Whites general heteroscedasticity test 
d. Spearman’s rank correlation test 
Reordering of observations with respect to the explanatory variable is the first step in conducting the 
following heteroscedasticity test 
a. Goldfield-Quandt test 
b. Breuseh-Pagan-Godfrey test 
c. Whites general heteroscedasticity test 
d. Spearman’s rank correlation test 
This test is a test of both heteroscedasticity and specification error test 
a. Goldfield-Quandt test 
b. Breuseh-Pagan-Godfrey test 
c. Whites general heteroscedasticity test 
d. Spearman’s rank correlation test 
The following remedial measure for heteroscedasticity is used when the o is known for a regression 
model 
a. Koenker-Bassett method 
b. Weighted least square method 
c. OLS method 
d. White’s procedure 
Which of the following is NOT considered the assumption about the pattern of heteroscedasticity 
a. The error variance is proportional to X; 
b. The error variance is proportional to Y; 
c. The error variance is proportional to X? 
d. The error variance is proportional to the square of the mean value of Y 
Even if heteroscedasticity 1s suspected and detected, it is not easy to correct the problem. This statement 
is 
a. True 
b. False 
c. Sometimes true 
d. Depends on test statistics used 
Heteroscedasticity may arise due to various reasons. Which one of these is NOT a reason 
a. Extremely low or high values of X and Y coordinates in the dataset 
b. Correlation of variables over time 
c. Incorrect specification of the functional form of the model 
d. Incorrect transformation of variables 
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Parks test is a 


a. One stage procedure 
b. Two stage procedure 
c. Three stage procedure 
d. Four stage procedure 


For testing of heteroscedasticity we first obtain the OLS estimates of the parameters for all the testing 
procedures except 


a. Park test 

b. Glejser test 

c. Spearman’s rank correlation test 
d. Graphical test 


Exercises 


Questions 


Lt. 


State with brief reason whether the following statements are true, false, or uncertain: 

a. In the presence of heteroscedasticity OLS estimators are biased as well as inefficient. 

b. If heteroscedasticity is present, the conventional t and F tests are invalid. 

c. In the presence of heteroscedasticity the usual OLS method always overestimates the standard 
errors of estimators. 

d. If residuals estimated from an OLS regression exhibit a systematic pattern, it means heteroscedas- 
ticity is present in the data. 

e. There is no general test of heteroscedasticity that is free of any assumption about which variable 
the error term is correlated with. 

J- If a regression model is mis-specified (e.g., an important variable is omitted), the OLS residuals 
will show a distinct pattern. 

g. If a regressor that has nonconstant variance is (incorrectly) omitted from a model, the (OLS) 
residuals will be heteroscedastic. 


. Ina regression of average wages (W, $) on the number of employees (N) for a random sample of 30 


firms, the following regression results were obtained:” 


W=75+ 0.009N 


t=na. (16.10) R? = 0.90 (1) 
W/N= 0.008+ 7.8(1/N) > 
t= (14.43) (76.58) R? = 0.99 


a. How do you interpret the two regressions? 

b. What is the author assuming in going from Eq. (1) to Eq. (2)? Was he worried about heteroscedas- 
ticity? How do you know? 

c. Can you relate the slopes and intercepts of the two models? 

d. Can you compare the R? values of the two models? Why or why not? 


"See Dominick Salvatore, Managerial Economics, McGraw-Hill, New York, 1989, p. 157. 
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11.3. a. Can you estimate the parameters of the models 


11.4. 


iS: 


11.6. 


iúil = / By + BoX; + vi 


lâi] = ./ Bi + BoX? + vi 


by the method of ordinary least squares? Why or why not? 
b. If not, can you suggest a method, informal or formal, of estimating the parameters of such models? 
(See Chapter 14.) 
Although log models as shown in Eq. (11.6.12) often reduce heteroscedasticity, one has to pay careful 
attention to the properties of the disturbance term of such models. For example, the model 


Pry, i (1) 
can be written as 
In Y; = In f; + Bo ln X; + In u; (2) 


a. If In u;is to have zero expectation, what must be the distribution of u;? 
b. If E(u) = 1, will E(n u;) = 0? Why or why not? 

c. If E(ln u;) is not zero, what can be done to make it zero? 

Show that £% of Eq. (11.3.8) can also be expressed as 


._ Ewy 
B= Se 
2 wx; 
and var (83) given in Eq. (11.3.9) can also be expressed as 
l 
wix;* 


where = Y; — Y* and ae Tia Ke represent deviations from the weighted means y* and y* 


defined as 
r= omy. / ow 
X= mxw 


For pedagogic purposes Hanushek and Jackson estimate the following model: 
C, = Bi + Bo2GNP; + B3D; + uj (1) 


where C, = aggregate private consumption expenditure in year t£, GNP, = gross national product in 
year t, and D = national defense expenditures in year , the objective of the analysis being to study the 
effect of defense expenditures on other expenditures in the economy. 

Postulating that a o*(GNP,)*, they transform (1) and estimate 


C,/GNP; = Pı (1/GNP;) + $2 + B3 (D:/GNP;) + u:/GNP, (2) 


var (B;) = 
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1.7: 


11.8. 
T19; 


11.10. 


The empirical results based on the data for 1946-1975 were as follows (standard errors in the paren- 
theses):* 


Eun + 0.6248 GNP, — 0.4398 D, 
(2.73) (0.0060) (0.0736) R? = 0.999 
C,/GNP, = 25.92(1/GNP,) + 0.6246 = 0.4315(D,/GNP,) 
(2.22) (0.0068) (0.0597) ` R? = 0.875 


a. What assumption is made by the authors about the nature of heteroscedasticity? Can you justify it? 

b. Compare the results of the two regressions. Has the transformation of the original model improved 
the results, that is, reduced the estimated standard errors? Why or why not? 

c. Can you compare the two R? values? Why or why not? (Hint: Examine the dependent variables.) 

Refer to the estimated regression in Eqs. (11.6.2) and (11.6.3). The regression results are quite similar. 

What could account for this outcome? 

Prove that if w; = w, a constant, for each i, B} and ĝ, as well as their variance are iis 

Refer to fora (11.2.2) and (11.2.3). Assume 


Grea is 


where o” is a constant and where k, are known weights, not necessarily all equal. 


Using this assumption, show that the variance given in Eq. (11.2.2) can be expressed as 
of xg 
ar oe 


The first term on the right side is the variance formula given in Eq. (11.2.3), that is, var ( Bo) under 
homoscedasticity. What can you say about the nature of the relationship between var ( Bo) under 
heteroscedasticity and under homoscedasticity? (Hint: Examine the second term on the right side of 
the preceding formula.) Can you draw any general conclusions about the relationships between Eqs. 
(11.2.2) and (11.2.3)? 

In the model 


var (ĝ2) = 


Y; = BX; +u; (Note: there is no intercept) Sa 
you are told that var (u;) = 07 X?. Show that 
yx 


= aia 


Empirical Exercises 


LEA 


For the regression model given in Eq (11.5.3) find the rank correlation between I? | and Xi and 
comment on the nature of heteroscedasticity, if any, present in the data 


“Eric A. Hanushek and John E. Jackson, Statistical Methods for Social Scientists, Academic, New York, 1977, p. 160. 
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11.12. Table 11.6 gives data on the sales/cash ratio in U.S. manufacturing industries classified by the asset 
size of the establishment for the period 1971-1 to 1973-IV. (The data are on a quarterly basis.) The 
sales/cash ratio may be regarded as a measure of income velocity in the corporate sector, that is, the 
number of times a dollar turns over. 

a. For each asset size compute the mean and standard deviation of the sales/cash ratio. 

b. Plot the mean value against the standard deviation as computed in (a), using asset size as the unit 
of observation. 

c. By means of a suitable regression model decide whether standard deviation of the ratio increases 
with the mean value. If not, how would you rationalize the result? 

d. If there is a statistically significant relationship between the two, how would you transform the data 
so that there is no heteroscedasticity? 

11.13. Bartlett's homogeneity-of-variance test. Suppose there are k independent sample variances 
SON, . «sa NN fifa- - - fdf, each from populations which are normally distributed with mean u 
and variance o}. Suppose further that we want to test the null hypothesis Hy: o? = 07 =--- = of = 0?; 
that is, each sample variance is an estimate of the same population variance o”. 

If the null hypothesis is true, then 


k 
fis? 
= 2 3 — De 
Dade f 
Table 11.6 Asset Size (millions of dollars) 
Year and : 
Quarter 1-10 10-25 25-50 50-100 100-250 250-1,000 1,000 + 
197141 6.696 6.929 6.858 6.966 7.819 7.557 7.860 
-Il 6.826 7.311 7299 7.081 7.907 — 7.685 7.351 
-lll 6.338 7.035 7.082 7.145 7.691 7.309 7.088 
-IV 6.272 6.265 6.874 6.485 6.778 7.120 6.765 
1972-1 6.692 6.236 7.101 7.060 7.104 7.584 6.717 
-lII 6.818 7.010 7.719 7.009 8.064 7.457 7.280 
-ill 6.783 6.934 7.182 6.923 7.784 7.142 6.619 
-IV 6.779 6.988 6.531 7.146 7279 6.928 6.919 
1973- 7.291 7.428 7.272 ZSZ 7.583 7.053 6.630 
-li 7.766 9.071 7.818 8.692 8.608 Z571 6.805 
-lll 7.733 8.357 8.090 8.357 7.680 7.654 6.772 
-IV 8.316 7.621 7.766 7.867 7.666 7.380 7.072 


Source: Quarterly Financial Report for Manufacturing Corporations, Federal Trade Commission and the Securities and Exchange 
Commission, U.S. government, various issues (computed). 


"See “Properties of Sufficiency and Statistical Tests,” Proceedings of the Royal Society of London A, vol. 160, 1937, p. 268. 
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11.14. 


Biss 


11.16. 


p an estimate of the common (pooled) estimate of the population variance a’, where 
f= (n;— 1), n; being the number of observations in the ith group and where f = Tja hie 

Sor m shown that the null hypothesis can be tested by the ratio A/B, which is approximately 
distributed as the y? distribution with k — 1 df, where 


= fins? — ) (fins?) 


and 


l l l 
a a ber 


Apply Bartlett’s test to the data of Table 11.1 and verify that the hypothesis that population variances 
of employee compensation are the same in each employment size of the establishment cannot be 
rejected at the 5 percent level of significance. 

Note: f., the df for each sample variance, is 9, since n; for each sample (i.e., employment class) 
is 10. 
Consider the following regression-through-the origin model: 


Yi= 0X; 4a eri = eZ 


You are told that u; ~ M(0, 0”) and u, ~ N(0, 207) and that they are statistically independent. If X, = 
+ | and X, = —1, obtain the weighted least-squares (WLS) estimate of 6 and its variance. If in this 
situation you had assumed incorrectly that the two error variances were the same (say, equal to a”), 
what would be the OLS estimator of 8? And its variance? Compare these estimates with the estimates 
obtained by the method of WLS. What general conclusion do you draw?” 

Table 11.7 gives data on 81 cars about MPG (average miles per gallons), HP (engine horsepower). 
VOL (cubic feet of cab space), SP (top speed, miles per hour), and WT (vehicle weight in 100 Ibs.). 
a. Consider the following model: 


MPG; = 8; + BSP; + HP; + B4WT; + ui 


Estimate the parameters of this model and interpret the results. Do they make economic sense? 

b. Would you expect the error variance in the preceding model to be heteroscedastic? Why? 

c. Use the White test to find out if the error variance is heteroscedastic. ~ 

d. Obtain White’s heteroscedasticity-consistent standard errors and ¢ values and compare your results 
with those obtained from OLS. 

e. If heteroscedasticity is established, how would you transform the data so that in the transformed 
data the error variance is homoscedastic? Show the necessary calculations. 

Food expenditure in India. In Table 2.8 we have given data on expenditure on food and total expen- 

diture for 55 families in India. 

a. Regress expenditure on food on total expenditure, and examine the residuals obtained from this 
regression, 

b. Plot the residuals obtained in (a) against total expenditure and see if you observe any systematic 
pattern. 


“Adapted from F. A. F. Seber, Linear Regression Analysis, John Wiley & Sons, New York, 1977, p. 64. 


Table 11.7 Passenger Car Mileage Data 


Observation MPG 


1 65.4 

2 56.0 

3 559. 

4 49.0 

5 46.5 

6 46.2 

7 45.4 

8 592 

9 533 
10 43.4 
11 41.1 
12 40.9 
13 40.9 
14 40.4 
15 39.6 
16 39.3 
17 38.9 
18 38.8 
19 38.2 
20 42.2 
21 40.9 
22 40.7 
23 40.0 
24 39.3 
25 38.8 
26 38.4 
27 38.4 
28 38.4 
29 46.9 
30 36.3 
31 36.1 
32 36.1 
33 35.4 
34 35.3 
35 351 
36 35.1 
37 35.0 
38 332 
39 32.9 
40 32:3 
41 322 
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120 
106 


Note: 


VOL = cubic feet of cab space. 


HP = engine horsepower. 


MPG = average miles per gallon. 
SP = top speed, miles per hour. 
WT = vehicle weight, hundreds of pounds. 


Observation = car observation number (Names of cars not disclosed). 


Source: U.S. Environmental Protection Agency, 1991, Report EPA/AA/CTAB/91-02. 


Observation 


42 
43 


MPG 
322 


PAT 
25.6 
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55.0 


430 Basic Econometrics 


11.17. 


c. If the plot in (b) suggests that there is heteroscedasticity, apply the Park, Glejser, and White tests 
to find out if the impression of heteroscedasticity observed in (b) is supported by these tests. 

d. Obtain White’s heteroscedasticity-consistent standard errors and compare those with the OLS 
standard errors. Decide if it is worth correcting for heteroscedasticity in this example. 

Repeat Exercise 11.16, but this time regress the logarithm of expenditure on food on the logarithm of 

total expenditure. If you observe heteroscedasticity in the linear model of Exercise 11.16 but not in 

the log-linear model, what conclusion do you draw? Show all the necessary calculations. 


11.18. A shortcut to White’s test. As noted in the text, the White test can consume degrees of freedom if 


AN9: 


11.20. 


there are several regressors and if we introduce all the regressors, their squared terms, and their cross 
products. Therefore, instead of estimating regressions like Eq. (11.5.22), why not simply run the 
following regression: . . 

i? = a] +Y; +Y? ++ V; 


where J; are the estimated Y (i.e., regressand) values from whatever model you are estimating? After 

all, Y; is simply the weighted average of the regressors, with the estimated regression coefficients 

serving as the weights. 

Obtain the R? value from the preceding regression and use Eq. (11.5.22) to test the hypothesis that 
there is no heteroscedasticity. 

Apply the preceding test to the food expenditure example of Exercise 11.16. 

Return to the R&D example discussed in Section 11.7 Example 11.10. Repeat the example using 

profits as the regressor. A priori, would you expect your results to be different from those using sales 

as the regressor? Why or why not? 

Table 11.8 gives data on median salaries of full professors in statistics in research universities in the 

United States for the academic year 2007. 

a. Plot median salaries against years in rank (as a measure of years of experience). For the plotting 
purposes, assume that the median salaries refer to the midpoint of years in rank. Thus, the salary 
$124,578 in the range 4-5 refers to 4.5 years in the rank, and so on. For the last group, assume that 
the range is 31-33. 

b. Consider the following regression models: 


Y; = a + 2X; + üi (1) 
Y; = Bi + BX; + BX? +; - (2) 


Table 11.8 Median Salaries of Full Professors in Statistics, 2007 


Years in Rank Count Median 
Otol 40 $101,478 

2 to 3 24 102,400 
4to5 35 124,578 
6to7 34 122,850 
8to9 33 116,900 
10 to 14 73 119,465 
15 to 19 69 114,900 
20 to 24 54 129,072 
25 to 30 44 131,704 


31 ormore . 25 143,000 


Source: American Statistical Association, “2007 Salary Report.” 
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where Y = median salary, X = years in rank (measured at midpoint of the range), and u and v are the 
error terms. Can you argue why model (2) might be preferable to model (1)? From the data given, 
estimate both the models. 

c. If you observe heteroscedasticity in model (1) but not in model (2), what conclusion would you 
draw? Show the necessary computations. 

d. If heteroscedasticity is observed in model (2), how would you transform the data so that in the 
transformed model there is no heteroscedasticity? 

11.21. You are given the following data: 


RSS, based on the first 30 observations = 55, df = 25 
RSS, based on the last 30 observations = 140, df = 25 


Carry out the Goldfeld—Quandt test of heteroscedasticity at the 5 percent level of significance. 
11.22. Table 11.9 gives data on percent change per year for stock prices (Y) and consumer prices (X) for a 

cross section of 20 countries. 

a. Plot the data in a scattergram. 

b. Regress Y on X and examine the residuals from this regression. What do you observe? 

c. Since the data for Chile seem atypical (outlier?), repeat the regression in (b), dropping the data on 
Chile. Now examine the residuals from this regression. What do you observe? 

d. If on the basis of the results in (b) you conclude that there was heteroscedasticity in error variance 
but on the basis of the results in (c) you reverse your conclusion, what general conclusions do you ` 
draw? 


Table 11.9 Stock and Consumer Prices, Post-World War II Period (through 1969) 


Rate of Change, % per Year 


Stock Prices, Consumer Prices, 
Country Y X 
1. Australia 50 4.3 
2. Austria 11.1 4.6 
3. Belgium 3.2 2.4 
4. Canada 7.9 2.4 
5. Chile 25.5 26.4 
6. Denmark 3.8 4.2 
7. Finland TET 5.5 
8. France 9.9 4.7 
9. Germany 13.3 2:2 
10. India TS 4.0 
11. Ireland 6.4 4.0 
12. Israel 8.9 8.4 
13. Italy 8.1 3.3 
14. Japan 1335 4.7 
15. Mexico 4.7 52 
16. Netherlands 7.5 3.6 
17. New Zealand 4.7 3.6 
18. Sweden 8.0 4.0 
19. United Kingdom 7.5 3.9 
20. United States 9.0 2.1 


Source: Phillip Cagan, Common Stock Values and Inflation: The Historical Record of Many Countries, National Bureau of Economic Research, 
Suppl., March 1974, Table 1, p. 4. 
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11.23. Table 11.10 from the website gives salary and related data on 447 executives of Fortune 500 companies. 
Data include salary = 1999 salary and bonuses; totcomp = 1999 CEO total compensation; tenure = 
number of years as CEO (0 if less than 6 months); age = age of CEO; sales = total 1998 sales revenue 


of the firm; profits = 1998 profits for the firm; and assets = total assets of the firm in 1998. 


a. Estimate the following regression from these data and obtain the Breusch—- Pagan—Godfrey statistic 


to check for heteroscedasticity: 


salary; = 6, + Botenure; + zage; + B4sales; + Bsprofits; + Beassets; + üi 


Does there seem to be a problem with heteroscedasticity? 


b. Now create a second model using In(Salary) as the dependent variable. Is there any improvement 


in the heteroscedasticity? 


c. Create scattergrams of salary vs. each of the independent variables. Can you discern which 
variable(s) is (are) contributing to the issue? What suggestions would you make now to address 


this? What is your final model? 


Key to Multiple Choice Questions 


1. (b) 2. (d) 3. (a) 4. (c) Sau(b)) 6. (b) 7. (a) 
10. (a) i legib) 12. 13. (a) 147) 15. (b) 16. (b) 


190) 20. (d) 


9. (b) 


18. (b) 


; ¥ = Appendix IIA 
IIA.I Proof of Equation (11.2.2) 


From Appendix 3A, Section 3A.3, we have 


var (By) = E(u} =P ious ap 200g e ku? + 2 cross-product terms) 


= E(tiud + ud +--+ Kl) 


. . x 
since the expectations of the cross-product terms are zero because of the assumption of no serial correlation, 


var (B2) = KE (u?) +E (u3) +--+ RE (u) 
since the k; are known. (Why?) 


var (ĝ2) = ko? + ko? +--+ ko? 


i E2 
since E(u) = of. 


var (po) = > ko? 


22) 
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I1A.2 The Method of Weighted Least Squares 


To illustrate the method, we use the two-variable model Y, = 6, + BX, + u,. The unweighted least-squares 
method minimizes 


beh = S0 ~ ĝi — BX; (1) 
to obtain the estimates, whereas the weighted least-squares method minimizes the weighted residual sum of 
squares: 


Do wil? = D> wO: — BY — BEX? (2) 
where f and $3 are the weighted least-squares estimators and where the weights w; are such that 
l 
y 3 
tees (3) 


that is, the weights are inversely proportional to the variance of u, or Y, conditional upon the given X,, it being 
understood that var (u; | X;) = var(Y; | X,) = o2. 
Differentiating Eq. (2) with respect to B* and 8}. we obtain 


ð a? a 2 

E OS wy E E 
ap; 

3$ wù? a* a* 

BF = 20 wilh = By ar By Xi)(—Xi) 


Setting the preceding expressions equal to zero, we obtain the following two normal equations: 


wn = BE wi + BY wiXi (4) 


> wiXiY; = Ê} Y wiX; + BS Dwi X? (5) 
Notice the similarity between these normal equations and the normal equations of the unweighted least 
squares. 
Solving these equations simultaneously, we obtain 
By =P -fi (6) 
and 


__ (2m) (Emin) - (Smee) (Dm) 
(Zw)(Dw8) (2x) 


The variance of Êž shown in Eq. (11.3.9) can be obtained in the manner of the variance of fz shown in 
Appendix 3A, Section 3A.3. 

Note: Y* =Y w;Y;i/ X} w, and X* = > w;X;/ 0 wi. As can be readily verified, these weighted means 
coincide with the usual or unweighted means Y and ¥ when w;= w, a constant, for all i. 


(11.3.8) = (7) 
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11A.3 Proof that E(G”) + 07 in the Presence of Heteroscedasticity 


Consider the two-variable model: 


Yi = Bit Xi tu (1) 
where var (u;) = 07 
Now 
aoe ya = EY; — Îĉ)? _ LIBi + 2X: +ui — Bi — Ê&X:? 
n—2 | n—2 n—2 (2) 
CD E = 61) = (Bo = Ba) Xi + iP 
n—2 


Noting that (8; — £1) = —(Ê2 — p2)X + ū, and substituting this into Eq. (2) and taking expectations on both 
sides, we get: l 


£62) = AA {Yo x? var(A.) + E[ (ui — |} 


1 Yeo? j=) yo? (3) 
eee 4 ee | 


= 7 


where use is made of Eq. (11.2.2). 

As you can see from Eq. (3), if there is homoscedasticity, that is, o? = o° foreach i, E(é”) = o?. Therefore, 
the expected value of the conventionally computed 6? = >” ù? /(n — 2) will not be equal to the true a in the 
presence of heteroscedasticity.’ 


11A.4 White’s Robust Standard Errors 


To give you some idea about White’s heteroscedasticity-corrected standard errors, consider the two-variable 
regression model: 


Y; = pi + b2Xi +u;  var(u;) = ož l (1) 
As shown in Eq. (11.2.2), 
22 {v 
var (f2) = Lro i (2) 
(£=) 


Since o? are not directly observable, White suggests using u?, the squared residual for each i, in place of o? 
and estimating the var (82) as follows: 
mege 


var (62) = = (3) 
(2) 
White has shown that Eq. (3) is a consistent estimator of Eq. (2), that is, as the sample size increases indefi- 
nitely, Eq. (3) converges to Eq. O) 


‘Further details can be obtained from Jan Kmenta, Elements of Econometrics, 2d. ed., Macmillan, New York, 1986, 
pp. 276-278. 


*To be more precise, n times Eq. (3) converges in probability to E[(X; — 1x)?u?]/(o?)?, which is the probability limit of n 
times Eq. (2), where n is the sample size, u, is the expected value of X, and of is the (population) variance of X. For more 
details, see Jeffrey M. Wooldridge, Introductory Econometrics: A Modern Approach, South-Western Publishing, 2000, p. 250. 
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Incidentally, note that if your software package does not contain White’s robust standard error procedure, 
you can do it as shown in Eq. (3) by first running the usual OLS regression, obtaining the residuals from this 
regression, and then using formula (3). 

White’s procedure can be generalized to the k-variable regression model 


Y; = By + BoX2; + B3X3; + +++ + BeX ei + Ui (4) 
The variance of any partial regression coefficient, say 6,, is obtained as follows: 
Saha 
var (B;) = ne (5) 
(Ea) 


where ù, are the residuals obtained from the (original) regression (4) and %; are the residuals obtained from 
the (auxiliary) regression of the regressor X; on the remaining regressors in Eq. (4). 

Obviously, this is a time-consuming procedure, for you will have to estimate Eq. (5) for each X variable. 
Of course, all this labor can be avoided if you have a statistical package that does this routinely. Packages 
such as PC-GIVE, EViews, MICROFIT, SHAZAM, STATA, and LIMDEP now obtain White’s heteroscedas- 
ticity-robust standard errors very easily. 


CHAPTER 


Autocorrelation: 
What Happens if the 
Error Terms are Correlated? 


The reader may recall that there are generally three types of data that are available for empirical analysis: (1) 
cross section, (2) time series, and (3) combination of cross section and time series, also known as pooled data. 
In developing the classical linear regression model (CLRM) in Part 1 we made several assumptions, which 
were discussed in Section 7.1. However, we noted that not all of these assumptions would hold in every type 
of data. As a matter of fact, we saw in the previous chapter that the assumption of homoscedasticity,. or equal 
error variance, may not always be tenable in cross-sectional data. In other words, cross-sectional data are 
often plagued by the problem of heteroscedasticity. 

However, in cross-section studies, data are often collected on the basis of a random sample of cross- 
sectional units, such as households (in a consumption function analysis) or firms (in an investment study 
analysis) so that there is no prior reason to believe that the error term pertaining to one household or firm is 
correlated with the error term of another household or firm. If by chance such a correlation is observed in 
cross-sectional units, it is called spatial autocorrelation, that is, correlation in space rather than over time. 
However, it is important to remember that, in cross-sectional analysis, the ordering of the data must have 
some logic, or economic interest, to make sense of any determination of whether (spatial) autocorrelation is 
present or not. 

The situation, however, is likely to be very different if we are dealing with time series data, for the obser- 
vations in such data follow a natural ordering over time so that successive observations are likely to exhibit 
intercorrelations, especially if the time interval between successive observations is short, such as a day, a 
week, or a month rather than a year. If you observe stock price indexes, such as the S&P CNX Nifty index 
or BSE Sensex index, over successive days, it is not unusual to find that these indexes move up or down for 
several days in succession. Obviously, in situations like this, the assumption of no auto-, or serial, corre- 
lation in the error terms that underlies the CLRM will be violated. 

In this chapter we take a critical look at this assumption with a view to answering the following questions: 


1. What is the nature of autocorrelation? 
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2. What are the theoretical and practical consequences of autocorrelation? 

3. Since the assumption of no autocorrelation relates to the unobservable disturbances u,, how does one 
know that there is autocorrelation in any given situation? Notice that we now use the subscript t to 
emphasize that we are dealing with time series data. 

4. How does one remedy the problem of autocorrelation? 


The reader will find this chapter in many ways similar to the preceding chapter on heteroscedasticity 
in that under both heteroscedasticity and autocorrelation the usual OLS estimators, although linear, 
unbiased, and asymptotically (i.e., in large samples) normally distributed,’ are no longer minimum 
variance among all linear unbiased estimators. In short, they are not efficient relative to other linear 
and unbiased estimators. Put differently, they may not be best linear unbiased estimators (BLUE). As 
a result, the usual, ¢, F, and xy may not be valid. 


12.1 The Nature of the Problem 


The term autocorrelation may be defined as “correlation between members of series of observations ordered 
in time [as in time series data] or space [as in cross-sectional data] ”? Tn the regression context, the classical 
linear regression model assumes that such autocorrelation does not exist in the disturbances u,. Symbolically, 

cov(u;, u;|x;, x;) = Euu) =0 Leap (3.2.5) 

Put simply, the classical model assumes that the disturbance term relating to any observation is not influ- 
enced by the disturbance term relating to any other observation. For example, if we are dealing with quarterly 
time series data involving the regression of output on labor and capital inputs and if, say, there is a labor 
strike affecting output in one quarter, there is no reason to believe that this disruption will be carried over to 
the next quarter. That is, if output is lower this quarter, there is no reason to expect it to be lower next quarter. 
Similarly, if we are dealing with cross-sectional data involving the regression of family consumption expen- 
diture on family income, the effect of an increase of one family’s income on its consumption expenditure is 
not expected to affect the consumption expenditure of another family. 

However, if there is such a dependence, we have autocorrelation. Symbolically, 

E(uju;) #0 iy (12.1.1) 

In this situation, the disruption caused by a strike this quarter may very well affect output next quarter, or 
the increases in the consumption expenditure of one family may very well prompt another family to increase 
its consumption expenditure if it wants to keep up with the Joneses. 

Before we find out why autocorrelation exists, it is essential to clear up some terminological questions. 
Although it is now a common practice to treat the terms autocorrelation and serial correlation synony- 
mously, some authors prefer to distinguish the two terms. For example, Tintner defines autocorrelation as 
“Jag correlation of a given series with itself, lagged by a number of time units.” whereas he reserves the term 
serial correlation to define “lag correlation between two different series.”* Thus, correlation between two 
time series such as u}, U>,... , Uj ANd u, U3, . . ., 441, Where the former is the latter series lagged by one time 
period, is autocorrelation, whereas correlation between time series such as Wj, Up, . -, Mjo ANd vz, V3,- . -> Vin, 
where u and v are two different time series, is called serial correlation. Although the distinction between the 
two terms may be useful, in this book we shall treat them synonymously. 


'On this, see William H. Greene, Econometric Analysis, 4th ed., Prentice Hall, NJ, 2000, Chapter 11, and Paul A. Rudd, An 
Introduction to Classical Econometric Theory, Oxford University Press, 2000, Chapter 19. 
2Maurice G. Kendall and William R. Buckland, A Dictionary of Statistical Terms, Hafner Publishing Company, New York, 1971, p. 8. 


3Gerhard Tintner, Econometrics, John Wiley & Sons, New York, 1965. 
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Let us visualize some of the plausible patterns of auto- and nonautocorrelation, which are given in Figure 
12.1. Figures 12.1a to d show that there is a discernible pattern among the w’s. Figure 12.1a shows a cyclical 
pattern; Figures 12.1b and c suggest an upward or downward linear trend in the disturbances; whereas 
Figure 12.1d indicates that both linear and quadratic trend terms are present in the disturbances. Only Figure 
12.le indicates no systematic pattern, supporting the nonautocorrelation assumption of the classical linear 
regression model. 


a 


uu 


Time Time 


(b) 


Time Time 


(e) 


Figure 12.1 Patterns of autocorrelation and nonautocorrelation. 
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The natural question is: Why does serial correlation occur? There are several reasons, some of which are 
as follows: 


Inertia 


A salient feature of most economic time series is inertia, or sluggishness. As is well known, time series 
such as GNP, price indexes, production, employment, and unemployment exhibit (business) cycles. Starting 
at the bottom of the recession, when economic recovery starts, most of these series start moving upward. 
In this upswing, the value of a series at one point in time is greater than its previous value. Thus there is a 
“momentum” built into them, and it continues until something happens (e.g., increase in interest rate or taxes 
or both) to slow them down. Therefore, in regressions involving time series data, successive observations are 
likely to be interdependent. 


Specification Bias: Excluded Variables Case 


In empirical analysis the researcher often starts with a plausible regression model that may not be the most 
“perfect” one. After the regression analysis, the researcher does the postmortem to find out whether the 
results accord with a priori expectations. If not, surgery is begun. For example, the researcher may plot 
the residuals ú; obtained from the fitted regression and may observe patterns such as those shown in Figure 
12.1a to d. These residuals (which are proxies for u;) may suggest that some variables that were originally 
candidates but were not included in the model for a variety of reasons should be included. This is the case of 
excluded variable specification bias. Often the inclusion of such variables removes the correlation pattern 
observed among the residuals. For example, suppose we have the following demand model: 

Y, = Bi + BoX2 + BsX3y + BaXae + uty (12.1.2) 
where Y = quantity of beef demanded, X, = price of beef, X} = consumer income, X, = price of pork, and 
t = time.* However, for some reason we run the following regression: 

Y, = Bi + BoX2r + PX + vs (12.1.3) 
Now if Eq. (12.1.2) is the “correct” model or the “truth” or true relation, running Eq. (12.1.3) is tanta- 
mount to letting v, = B4X4 + u, And to the extent the price of pork affects the consumption of beef, the error 
or disturbance term v will reflect a systematic pattern, thus creating (false) autocorrelation. A simple test 
of this would be to run both Eqs. (12.1.2) and (12.1.3) and see whether autocorrelation, if any, observed in 
model (12.1.3) disappears when model (12.1.2) is run. The actual mechanics of detecting autocorrelation 
will be discussed in Section 12.6 where we will show that a plot of the residuals from regressions (12.1.2) 
and (12.1.3) will often shed considerable light on serial correlation. 


Specification Bias: Incorrect Functional Form 
Suppose the “true” or correct model in a cost-output study is as follows: 
Marginal cost; = 6; + B2 output; + B3 output? +ü; (12.1.4) 
but we fit the following model: 
Marginal cost; = a + @2 output; + v; (12.1.5) 
The marginal cost curve corresponding to the “true” model is shown in Figure 12.2 along with the “incorrect” 


linear cost curve. 


4As a matter of convention, we shall use the subscript t to denote time series data and the usual subscript i for cross-sectional 
data. 

Sif it is found that the real problem is one of specification bias, not autocorrelation, then as will be shown in Chapter 13, the 
OLS estimators of the parameters in Eq. (12.1.3) may be biased as well as inconsistent. 
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Marginal cost of production 


0 


Output 


Figure 12.2 Specification bias: incorrect functional form. 


As Figure 12.2 shows, between points A and B the linear marginal cost curve will consistently overes- 
timate the true marginal cost, whereas beyond these points it will consistently underestimate the true marginal 
cost. This result is to be expected, because the disturbance term v; is, in fact, equal to output? + u; and hence 
will catch the systematic effect of the output” term on marginal cost. In this case, v; will reflect autocorre- 
lation because of the use of an incorrect functional form. In Chapter 13 we will consider several methods of 
detecting specification bias. 


Cobweb Phenomenon 


The supply of many agricultural commodities reflects the so-called cobweb phenomenon, where supply 
reacts to price with a lag of one time period because supply decisions take time to implement (the gestation 
period). Thus, at the beginning of this year’s planting of crops, farmers are influenced by the price prevailing 
last year, so that their supply function is 


Supply, = 8} + BoPi-1 + ur (12.1.6) 


Suppose at the end of period t, price P, turns out to be lower than P,_, Therefore, in period ż + 1 farmers 
may very well decide to produce less than they did in period ft. Obviously, in this situationthe disturbances 
iu, are not expected to be random because if the farmers overproduce in year t, they are likely to reduce their 
production in ¢ + 1, and so on, leading to a cobweb pattern. 


Lags 


In a time series regression of consumption expenditure on income, it is not uncommon to find that the 
consumption expenditure in the current period depends, among other things, on the consumption expenditure 
of the previous period. That is, 


Consumption, = 6; + B2 income; + £3 consumption,_, + ur (12.1.7) 


A regression such as Eq. (12.1.7) is known as autoregression because one of the explanatory variables is the 
lagged value of the dependent variable. (We shali study such models in Chapter 17.) The rationale for a model 
such as Eq. (12.1.7) is simple. Consumers do not change their consumption habits readily for psychological, 
technological, or institutional reasons. Now if we neglect the lagged term in Eq. (12.1.7), the resulting error 
term will reflect a systematic pattern due to the influence of lagged consumption on current consumption. 
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“Manipulation” of Data 


In empirical analysis, the raw data are often “manipulated.” For example, in time series regressions involving 
quarterly data, such data are usually derived from the monthly data by simply adding three monthly obser- 
vations and dividing the sum by 3. This averaging introduces smoothness into the data by dampening the 
fluctuations in the monthly data. Therefore, the graph plotting the quarterly data looks much smoother than 
the monthly data, and this smoothness may itself lend to a systematic pattern in the disturbances, thereby 
introducing autocorrelation. Another source of manipulation is interpolation or extrapolation of data. For 
example, the Census of Population is conducted every 10 years in this country, the last being in 2001 and 
the one before that in 1991. Now if there is a need to obtain data for some year within the intercensus period 
1991-2001, the common practice is to interpolate on the basis of some ad hoc assumptions. All such data 
Poi techniques might impose upon the data a systematic pattern that might not exist in the original 
data. 


Data Transformation 
As an example of this, consider the following model: 
Y, = Bi + BX; +u; (12.1.8) 


where, say, Y = consumption expenditure and X = income. Since Eq. (12.1.8) holds true at every time period, 
it holds true also in the previous time period, (t — 1). So, we can write Eq. (12.1.8) as 


Y,- = By + BoXr-1 + úr- (12.1.9) 


Y,_,, X,- and u,_, are known as the lagged values of Y, X, and u, respectively, here lagged by one period. We 
will see the importance of the lagged values later in the chapter as well in several places in the text. 
Now if we subtract Eq. (12.1.9) from Eq. (12.1.8), we obtain 


KI = LAX, Au, (12.1.10) 


where A, known as the first difference operator, tells us to take successive differences of the variables in 
question. Thus, AY, = (Y, - Y,_,), AX, = (X, — X,_,), and Au, = (u, — u,_,). For empirical purposes, we write 
Eq. (12.1.10) as 

AY, = þh AX; + v: (12.1.11) 
where v,= Au, = (u,— u,_}). 

Equation (12.1.9) is known as the level form and Eq. (12.1.10) is known as the (first) difference form. 
Both forms are often used in empirical analysis. For example, if in Eq. (12.1.9) Y and X represent the 
logarithms of consumption expenditure and income, then in Eq. (12.1.10) AY and AX will represent changes 
in the logs of consumption expenditure and income. But as we know, a change in the log of a variable is a 
relative change, or a percentage change, if the former is multiplied by 100. So, instead of studying relation- 
ships between variables in the level form, we may be interested in their relationships in the growth form. 

Now if the error term in Eq. (12.1.8) satisfies the standard OLS assumptions, particularly the assumption 
of no autocorrelation, it can be shown that the error term v, in Eq. (12.1.11) is autocorrelated. (The proof is 
given in Appendix 12A, Section 12A.1.) It may be noted here that models like Eq. (12.1.11) are known as 
dynamic regression models, that is, models involving lagged regressands. We will study such models in 
depth in Chapter 17. 

The point of the preceding example is that sometimes autocorrelation may be induced as a result of 
transforming the original model. 


On this, see William H. Greene, op. cit., p. 526. 
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Nonstationarity 


We mentioned in Chapter 1 that, while dealing with time series data, we may have to find out if a given time 
series is stationary. Although we will discuss the topic of nonstationary time series more thoroughly in the 
chapters on time series econometrics in Part 5 of the text, loosely speaking, a time series is stationary if its 
characteristics (e.g., mean, variance, and covariance) are time invariant; that is, they do not change over time. 
If that is not the case, we have a nonstationary time series. 

As we will discuss in Part 5, in a regression model such as Eq. (12.1.8), it is quite possible that both Y and 
X are nonstationary and therefore the error u is also nonstationary.’ In that case, the error term will exhibit 
autocorrelation. . 

In summary, then, there are a variety of reasons why the error term in a regression model may be autocor- 
related. In the rest of the chapter we investigate in some detail the problems posed by autocorrelation and 
what can be done about it. 

It should be noted also that autocorrelation can be positive (Figure 12.3a) as well as negative, although 
most economic time series generally exhibit positive autocorrelation because most of them either move 
upward or downward over extended time periods and do not exhibit a constant up-and-down movement such 
as that shown in Figure 12.35. 


(a) 


u, 


0 Time Uy 


(b) 
Figure 12.3 (a) Positive and (b) negative autocorrelation. 


7As we will also see in Part 5, even though Yand X are nonstationary, it is possible to find u to be stationary. We will explore 
the implication of that later on. 
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12.2 OLS Estimation in the Presence of Autocorrelation 


What happens to the OLS estimators and their variances if we introduce autocorrelation in the disturbances 
by assuming that E(u,u,,,) # 0 (s # 0) but retain all the other assumptions of the classical model? Note again 
that we are now using the subscript f on the disturbances to emphasize that we are dealing with time series 
data. 

We revert once again to the two-variable regression model to explain the basic ideas involved, namely, Y, 
= B, + BX, + u,. To make any headway, we must assume the mechanism that generates u,, for E(u,u,,,) # 0 
(s # Q) is too general an assumption to be of any practical use. As a starting point, or first approximation, one 
can assume that the disturbance, or error, terms are generated by the following mechanism. 


Uy = Puy, + £ -l<p<1 (12.2.1) 


where p ( = rho) is known as the coefficient of autocovariance and where ¢, is the stochastic disturbance term 
such that it satisfies the standard OLS assumptions, namely, 


E(é;) = 0 
vane) = 02 (12.2.2) 
COV (Er, Et+s) = 0 s #0 


In the engineering literature, an error term with the preceding properties is often called a white noise 
error term. What Eq. (12.2.1) postulates is that the value of the disturbance term in period ż is equal to p 
times its value in the previous period plus a purely random error term. 

The scheme (12.2.1) is known as a Markov first-order autoregressive scheme, or simply a first-order 
autoregressive scheme, usually denoted as AR(1). The name autoregressive is appropriate because Eq. 
(12.2.1) can be interpreted as the regression of u,on itself lagged one period. It is first order because u, and its 
immediate past value are involved; that is, the maximum lag is 1. If the model were u,= p,u,_| + Pru,» + Ep 
it would be an AR(2), or second-order, autoregressive scheme, and so on. We will examine such higher-order 
schemes in the chapters on time series econometrics in Part 5. 

In passing, note that p, the coefficient of autocovariance in Eq. (12.2.1), can also be interpreted as the 
first-order coefficient of autocorrelation, or more accurately, the coefficient of autocorrelation at lag 1.” 

Given the AR(1) scheme, it can be shown that (see Appenuix 12A, Section 12A.2): 


2 


var (u) = E(u?) = - m (12.2.3) 


2 


o 
COV (ur, Ut+s) = E(u;ty—s) = i a (12.2.4) 


Sif s = 0, we obtain £ (u2). Since E(u,) = 0 by assumption, E (u?) will represent the variance of the error term, which obviously 
is nonzero (why?). 
This name can be easily justified. By definition, the (population) coefficient of correlation between u, and u,_, is 


E {{ur — E (ulur — E (ur-1))} 
Nar (u) var (ut-1) 


_ E(utut-1) 


var (ut—1) 


since E(u,) = 0 for each t and var (u;) = var (u,_1) because we are retaining the assumption of homoscedasticity. The reader 
can see that p is also the slope coefficient in the regression of u, on Up- 
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cons), uae me (1225) 


where cov (u, U, means covariance between error terms s periods apart and where cor (u, u,,,) means corre- 
lation between error terms s periods apart. Note that because of the symmetry property of covariances and 
correlations, COV (u, U,,,) = COV (Up U,_,, and COF (Up, U,,,) = COF (Up Ugs). 

Since p is a constant between —1 and +1, Eq. (12.2.3) shows that under the AR(1) scheme, the variance 
of u, is still homoscedastic, but u,is correlated not only with its immediate past value but its values several 
periods in the past. It is critical to note that lpl < 1, that is, the absolute value of p is less than 1. If, for 
example, p is 1, the variances and covariances listed above are not defined. If |p! < 1, we say that the AR(1) 
process given in Eq. (12.2.1) is stationary; that is, the mean, variance, and covariance of u,do not change over 
time. If lpl is less than 1, then it is clear from Eq. (12.2.4) that the value of the covariance will decline as we 
go into the distant past. We will see the utility of the preceding results shortly. 

One reason we use the AR(1) process is not only because of its simplicity compared to higher-order AR 
schemes, but also because in many applications it has proved to be quite useful. Additionally, a considerable 
amount of theoretical and empirical work has been done on the AR(1) scheme. 

Now return to our two-variable regression model: Y,= B; + BX, + u,. We know from Chapter 3 that the 
OLS estimator of the slope coefficient is 


^ > XtYt 
2e 12.2.6 
Bo a ( ) 
and its variance is given by 
A o? 
var (f2) = 227) 


Be; 
where the small letters as usual denote deviation from the mean values. 
Now under the AR(1) scheme, it can be shown that the variance of this estimator is: 


o yew 


x a 5 Xrti 2 ya XtXt—2 
var (B2)ari Sa |i +2p a? + 2p pE 
where var ( bo) arı Means the variance of Ê under a first-order autoregressive scheme. 

A comparison of Eq. (12.2.8) with Eq. (12.2.7) shows the former is equal to the latter times a term that 
depends on p as well as the sample autocorrelations between the values taken by the regřessor X at various 
lags.!° And in general we cannot foretell whether var (B>) is less than or greater than var (b2) arı (but see 
Eq. [12.4.1]). Of course, if p is zero, the two formulas will coincide, as they should (why?). Also, if the 
correlations among the successive values of the regressor are very small, the usual OLS variance of the slope 
estimator will not be seriously biased. But, as a general principle, the two variances will not be the same. 

To give some idea about the difference between the variances given in Eqs. (12.2.7) and (12.2.8), assume 
that the regressor X also follows the first-order autoregressive scheme with a coefficient of autocorrelation of 
r. Then it can be shown that Eq. (12.2.8) reduces to: 


2 o? /l1+rp i l+rp 
var (B2)ara) = oe: (22) = var (B2)ors (; = 2) (12.2.9) 


+.--42 n—1 *1Xn 12.2.8 
p ie (12.2.8) 


10 . a 
Note that the term r = Ð xtx41/ $ x? is the correlation between X, and X, , (or X, 1, since the correlation coefficient is 
symmetric); r? = Y xtxt+2/ J x? is the correlation between the X’s lagged two periods; and so on. 
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If, for example. r = 0.6 and p = 0.8, using Eq. ( 12.2.9) we can check that var ( (Bo) Jar, = 2.8461 var GENS. 
To put it another way, var(B>)ors = 3 ha var (B>) ar) = 0.3513 var( B>)a). That is, the usual OLS formula 
(i.e., Eq. [12.2.7]) will underestimate the variance of (B2) ar, by about 65 percent. As you will realize, this 
answer is specific for the given values of r and p. But the point of this exercise is to warn you that a blind 
application of the usual OLS formulas to compute the variances and standard errors of the OLS estimators 
could give seriously misleading results. 

Suppose we continue to use the OLS estimator Bo and adjust the usual variance formula by taking into 
account the AR(1) scheme. That is. we use > given by Eq. (12.2.6) but use the variance formula given by Eq. 
(12.2.8). What now are the properties of By? It is easy to prove that Ê» is still linear and unbiased. As a matter 
of fact, as shown in Appendix 3A, Section 3A.2, the assumption of no serial correlation, like the assumption 
of no heteroscedasticity. is not required to prove that ĝ is unbiased. Is Ê» still BLUE? Unfortunately, it is not; 
in the class of linear unbiased estimators, it does not have minimum variance. In short, bo. although linear- 
unbiased, is not efficient (relatively speaking, of course). The reader will notice that this finding is quite 
similar to the finding that ĝ» is less efficient i in the presence of heteroscedasticity. There we saw that it was 
the weighted least-square estimator ps given in Eq. (11.3.8), a special case of the generalized least-squares 
(GLS) estimator, that was efficient. In the case of autocorrelation can we find an estimator that is BLUE? The 
answer is yes, as can be seen from the discussion in the following section. 


12.3 The BLUE Estimator in the Presence of Autocorrelation 


Continuing with the two-variable model and assuming the AR(1) process, we can show that the BLUE 
estimator of 8, is given by the following expression: !! 


gous Daa = Px — PM) 4 


+C (12.3.1) 
sa(X — PXt- 1)? 


where C is a correction factor that may be disregarded in practice. Note that the subscript t now runs from 
t=2 tot=n. And its variance is given by 


o2 


=r gee 
DNE = px) 


where D too is a correction factor that may also be disregarded in practice. (See Exercise 12.18.) 

The estimator £9", as the superscript suggests, is obtained by the method of GLS. As noted in Chapter 
11, in GLS we incorporate any additional information we have (e.g., the nature of the heteroscedasticity or 
of the autocorrelation) directly into the estimating procedure by transforming the variables, whereas in OLS 
such side information is not directly taken into consideration. As the reader can see, the GLS estimator of 
B- given in Eq. (12.3.1) incorporates the autocorrelation parameter p in the estimating formula, whereas the 
OLS formula given in Eq. (12.2.6) simply neglects it. Intuitively, this is the reason why the GLS estimator is 
BLUE and not the OLS estimator—the GLS estimator makes the most use of the available information.!? It 
hardly needs to be added that if p = 0, there is no additional information to be considered and hence both the 
GLS and OLS estimators are identical. 


var ASUS = (12.3.2) 


"For proofs, see Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, pp. 274-275. The correction factor C 
pertains to the first observation, (Y,, X,). On this point see Exercise 12.18. 

12The formal proof that es is BLUE can be found in Kmenta, ibid. But the tedious algebraic proof can be simplified con- 
siderably using matrix notation. See J. Johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, pp. 291-293. 
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In short, under autocorrelation, it is the GLS estimator given in Eq. (12.3.1) that is BLUE, and the 
minimum variance is now given by Eq. (12.3.2) and not by Eq. (12.2.8) and obviously not by Eq. (L2e2ay 


A Technical Note 


As we noted in the previous chapter, the Gauss—Markov theorem provides only the sufficient condition for 
OLS to be BLUE. The necessary and sufficient conditions for OLS to be BLUE are given by Kruskal’s 
theorem, mentioned in the previous chapter. Therefore, in some cases it can happen that OLS is BLUE 
despite autocorrelation. But such cases are infrequent in practice. 

What happens if we blithely continue to work with the usual OLS procedure despite autocorrelation? The 
answer is provided in the following section. 


12.4 Consequences of Using OLS in the Presence of Autocorrelation 


As in the case of heteroscedasticity, in the presence of autocorrelation the OLS estimators are still linear 
unbiased as well as consistent and asymptotically normally distributed, but they are no longer efficient (i.e., 
minimum variance). What then happens to our usual hypothesis testing procedures if we continue to use 
the OLS estimators? Again, as in the case of heteroscedasticity, we distinguish two cases. For pedagogical 
purposes we still continue to work with the two-variable model, although the following discussion can be 
extended to multiple regressions without much trouble.’ 


OLS Estimation Allowing for Autocorrelation 


As noted, bo is not BLUE, and even if we use var ( po) arp the confidence intervals derived from there are 
likely to be wider than those based on the GLS procedure. As Kmenta shows, this result is likely to be the 
case even if the sample size increases indefinitely.'* That is, Bo is not asymptotically efficient. The implication 
of this finding for hypothesis testing is clear: We are likely to declare a coefficient statistically insignificant 
(i.e., not different from zero) even though in fact (i.e., based on the correct GLS procedure) it may be. This 
difference can be seen clearly from Figure 12.4. In this figure we show the 95% OLS [AR(1)] and GLS 
confidence intervals assuming that true 8, = 0. Consider a particular estimate of B,, say. b». Since b, lies in 
the OLS confidence interval, we could accept the hypothesis that true B, is zero with 95 percent confidence. 
But if we were to use the (correct) GLS confidence interval, we could reject the null hypothesis that true B, 
is zero, for b, lies in the region of rejection. 


Ho: B2 = 0 


—_————___» 
GLS 95% interval 


OLS 95% interval 
Figure 12.4 GLS and OLS 95% confidence intervals. 


'3But matrix algebra becomes almost a necessity to avoid tedious algebraic manipulations. 
'4See Kmenta, op. cit., pp. 277-278. 


Autocorrelation: What Happens if the Error Terms are Correlated? 447 


The message is: To establish confidence intervals and to test hypotheses, one should use GLS and 
not OLS even though the estimators derived from the latter are unbiased and consistent. (However, see 
Section 12.11 later.) 


OLS Estimation Disregarding Autocorrelation 


The situation is potentially very serious if we not only use > but also continue to use var ( po) = 0? D 
which completely disregards the problem of autocorrelation, that is, we mistakenly believe that the a 
assumptions of the classical model hold true. Errors will arise for the following reasons: 


1. The residual variance ô? = ya ai? /(n — 2) is likely to underestimate the true a. 

2. Asa result, we are likely to overestimate R?. 

3. Even if ø? is not underestimated, var ( Bo) may underestimate var (£>)p, (Eq. [12.2.8]), its variance 
under (first-order) autocorrelation, even though the latter is inefficient compared to var (>). 

4. Therefore, the usual ¢ and F tests of significance are no longer valid, and if applied, are likely to 
give seriously misleading conclusions about the statistical significance of the estimated regression 
coefficients. 


To establish some of these propositions, let us revert to the two-variable model. We know from Chapter 3 
that under the classical assumption 


^2 
62 = 2 ui 
(n — 2) 
provides an unbiased estimator of o”, that is, E (6?) = o°. But if there is autocorrelation, given by AR(1), it 
can be shown that 


oa POFA 2 
n—2 
where r = $7] x;x:-1/ $7, x?, which can be interpreted as the (sample) correlation coefficient between 
successive values of the X’s.!° If p and r are both positive (nies an unlikely assumption for most economic 
time series), it is apparent from Eq. ( de. 4.1) that E(a*) < A ; that is, the usual residual variance formula, on 
average, a underestimate the true o*. In other words, 6? ‘all be biased downward. Needless to say, this 
bias in 6? will np transmitted to var ( po) because in practice we estimate the latter by the formula 67/ X xê. 
But even if g? is not underestimated, var (2) is a biased estimator of var ( po) AR}, Which can be readily 
seen by comparing Eq. (12.2.7) with Eq. (12.2.8),'° for the two formulas are not the same. As a matter of fact, 
if p is positive (which is true of most economic time series) and the X’s are positively correlated (also true of 
most economic time series), then it is clear that 


var (2) < var (B2) ari (12.4.2) 
that is, the usual OLS variance of bo underestimates its variance under AR(1) (see Eq. [12.2.9]). Therefore, 
if we use var (Bo), we Shall inflate the precision or accuracy (i. e., underestimate the standard error) of the 
estimator Bo. As a result, in computing the f ratio as t = bo /se (Bo) (under the hypothesis that 8, = 0), we shall 
be overestimating the t value and hence the statistical significance of the estimated 8,. The situation is likely 
to get worse if additionally g? is underestimated, as noted previously. 


E(é’) = (12.4.1) 


15See S$. M. Goldfeld and R. E. Quandt, Nonlinear Methods in Econometrics, North Holland Publishing Company, Amsterdam, 
1972, p. 183. In passing, note that if the errors are positively autocorrelated, the R? value tends to have an upward bias, that 
is, it tends to be larger than the R? in the absence of such correlation. 


16For a formal proof, see Kmenta, op. cit., p. 281. 
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To see how OLS is likely to underestimate g? and the variance of Bo. let us conduct the following Monte 
Carlo experiment. Suppose in the two-variable model we “know” that the true £, = 1 and 8, = 0.8. Therefore, 
the stochastic PRF is 


Y, = 1.0 + 0.8.4; + u (12.4.3) 


Hence, 
E(Y, | X) = 1.0 + 0.8.X; (12.4.4) 


which gives the true population regression line. Let us assume that u, are generated by the first-order autore- 
gressive scheme as follows: 


up = 0.7u;—-1 + & (12.4.5) 


where g, satisfy all the OLS assumptions. We assume further for convenience that the e,are normally distributed 
with zero mean and unit ( = 1) variance. Equation (12.4.5) postulates that the successive disturbances are 
positively correlated, with a coefficient of autocorrelation of +0.7, a rather high degree of dependence. 

Now, using a table of random normal numbers with zero mean and unit variance, we generated 10 random 
numbers shown in Table 12.1 and then by the scheme (12.4.5) we generated u,. To start off the scheme, we 
need to specify the initial value of u, say, uy = 5. 

Plotting the u, generated in Table 12.1, we obtain Figure 12.5, which shows that initially each successive 
u,is higher than its previous value and subsequently it is generally smaller than its previous value showing, 
in general, a positive autocorrelation. 

Now suppose the values of X are fixed at 1, 2, 3,..., 10. Then, given these X’s, we can generate a sample 
of 10 Y values from Eq. (12.4.3) and the values of u, given in Table 12.1. The details are given in Table 12.2. 
Using the data of Table 12.2, if we regress Y on X, we obtain the following (sample) regression: 


f, = 6.5452 + 0.3051X, 
(0.6153) (0.0992) 
t = (10.6366) (3.0763) 


(12.4.6) 


r? = 0.5419 ô? =0.8114 


Et U, = 0.7u;-4 + Et 

0 0 Up = 5 (assumed) 

1 0.464 uı = 0.7 (5) + 0.464 = 3.964 

2 2.026 u2 = 0.7(3.964) + 2.0262 = 4.8008 

3 2.455 u3 = 0.7 (4.8010) + 2.455 = 5.8157 

4 —0.323 u4 = 0.7(5.8157) — 0.323 = 3.7480 

5 —0.068 us = 0.7(3.7480) — 0.068 = 2.5556 

6 0.296 Us = 0.7(2.5556) + 0.296 = 2.0849 

7 —0.288 u7 = 0.7(2.0849) — 0.288 = 1.1714 

8 1.298 ug = 0.7(1.1714) + 1.298 = 2.1180 

9 0.241 ug = 0.7(2.1180) + 0.241 = 1.7236 
10 —0.957 uio = 0.7(1.7236) — 0.957 = 0.2495 


Note: €, data obtained from A Million Random Digits and One Hundred Thousand Deviates, Rand 
Corporation, Santa Monica, Calif., 1950. 
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0 Time 
1 2 3 4 5 6 T 8 9 10 


Figure 12.5 Correlation generated by the scheme #,= 0.74,_, + €, (Table 12.1). 


Table 12.2 Generation of Y Sample Values 


Xt Ut Y; =1.0+ 0.8X; + Ut 
1 3.9640 Yı = 1.0 + 0.8(1) + 3.9640 = 5.7640 
2 4.8010 Y2 = 1.0 + 0.8(2) + 4.8008 = 7.4008 
3 5.8157 ¥3 = 1.0 + 0.8(3) + 5.8157 = 9.2157 
4 3.7480 Y4 = 1.0 + 0.8(4) + 3.7480 = 7.9480 
5 2.5556 Ys = 1.0 + 0.8(5) + 2.5556 = 7.5556 
6 2.0849 Ye = 1.0 + 0.8(6) + 2.0849 = 7.8849 
7 1.1714 Y7 = 1.0 + 0.8(7) + 1.1714 = 7.7714 
8 2.1180 Ys = 1.0 + 0.8(8) + 2.1180 = 9.5180 
9 1.7236 Yo = 1.0 + 0.8(9) + 1.7236 = 9.9236 

10 0.2495 Yio = 1.0 + 0.8(10) + 0.2495 = 9.2495 


Note: u, data obtained from Table 12.1. 
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whereas the true regression line is as given by Eq. (12.4.4). Both the regression lines are given in Figure 12.6, 
which shows clearly how much the fitted regression line distorts the true regression line; it seriously under- 
estimates the true slope coefficient but overestimates the true intercept. (But note that the OLS estimators are 
still unbiased.) 

Figure 12.6 also shows why the true variance of u;is likely to be underestimated by the estimator 6”, which 
is computed from the ;. The ù; are generally close to the fitted line (which is due to the OLS procedure) but 
deviate substantially from the true PRF. Hence, they do not give a correct picture of u;. To gain some insight 
into the extent of underestimation of true a”, suppose we conduct another sampling experiment. Keeping the 
X, and e, given in Tables 12.1 and 12.2, let us assume p = 0, that is, no autocorrelation. The new sample of Y 
values thus generated is given in Table 12.3. 


X 


10 
È, = 6.5452 + 0.3051X, 


Y, = 1+ 0.8X, 
“~True PRF 


e Actual Y 


X 
0 2 4 6 8 10 


Figure 12.6 True PRF and the estimated regression line for the data of Table 12.2. 


Table 12.3 Sample of Y Values with Zero Serial Correlation 


X: Et = Ut Y: = 1.0 + 0.8X; + £ 
1 0.464 2.264 
2 2.026 4.626 
3 2.455 5.855 
4 —0.323 3.877 
5 —0.068 4.932 
6 0.296 6.096 
7 —0.288 6.312 
8 1.298 8.698 
g 0.241 8.441 

10 —0.957 8.043 


Note: Since there is no autocorrelation, the u, and ¢, are identical. The & are 
from Table 12.1. 
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The regression based on Table 12.3 is as follows: 


A 


Y, = 2.5345 + 0.6145X, 
(0.6796) (0.1087) 
t = (3.7910) (5.6541) 


r? = 0.7997 G? = 0.9752 
This regression is much closer to the “truth” because the ¥’s are now essentially random. Notice that ĉ? has 
increased from 0.8114 (p = 0.7) to 0.9752 (p = 0). Also notice that the standard errors of ĝi and p> have 
increased. This result is in accord with the theoretical results considered previously. 


(12.4.7) 


12.5 Relationship between Wages and Productivity in the 
Business Sector of the United States, 1960-2005 


Now that we have discussed the consequences of autocorrelation, the obvious question is, How do we detect it 
and how do we correct for it? Before we turn to these topics, it is useful to consider a concrete example. Table 
12.4 gives data on indexes of real compensation per hour Y (RCOMPB) and output per hour X (PRODB) in 
the business sector of the U.S. economy for the period 1960—2005, the base of the indexes being 1992 = 100. 


Table 12.4 Indexes of Real Compensation and Productivity, U.S., 1960-2005 (Index numbers, 
1992 = 100; quarterly data seasonally adjusted) 


Year y: X Year Y X 

1960 60.8 48.9 1983 90.3 83.0 
1961 62.5 50.6 F 1984 90.7 85.2 
1962 64.6 52.9 1985 92.0 87.1 
1963 66.1 55.0 1986 94.9 89.7 
1964 67.7 56.8 1987 95.2 90.1 
1965 69.1 58.8 1988 96.5 91.5 
1966 ZAR 61.2 1989 95.0 92.4 
1967 73.5 62.5 1990 96.2 94.4 
1968 76.2 64.7 1991 97.4 95.9 
1969 ZES 65.0 1992 100.0 100.0 
1970 78.8 66.3 1993 99.7 100.4 
1971 80.2 69.0 1994 99.0 101.3 
1972 82.6 71.2 1995 98.7 101.5 
1973 84.3 73.4 1996 99.4 104.5 
1974 83.3 72.3 1997 100.5 106.5 
1975 84.1 74.8 1998 105.2 109.5 
1976 86.4 77.1 1999 108.0 112.8 
1977 87.6 78.5 2000 112.0 116.1 
1978 89.1 79.3 2001 113.5 119.1 
1979 89.3 79.3 2002 115.7 124.0 
1980 89.1 79.2 2003 117.7 128.7 
1981 89.3 80.8 ' 2004 119.0 1327 


1982 90.4 80.1 2005 120.2 135.7 


EE emee eeii DOTORE 


Notes: Y = index of real compensation per hour, business sector (1992 = 100). 
X = index of output, business sector (1992 = 100). 


Source: Economic Report of the President, 2007, Table B-49. 
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Figure 12.7 Index of compensation (Y) and index of productivity (X), United States, 1960-2005. 


First plotting the data on Y and X, we obtain Figure 12.7. Since the relationship between real compensation 
and labor productivity is expected to be positive, it is not surprising that the two variables are positively 
related. What is surprising is that the relationship between the two is almost linear, although there is some hint 
that at higher values of productivity the relationship between the two may be slightly nonlinear. Therefore, we 
decided to estimate a linear as well as a log-linear model, with the following results: 


f, = 32.7419 + 0.6704X, 
se= (1.3940) (0.0157) 


(12.5.1) 
t = (23.4874) (42.7813) 
A = 0.9765 d= 1739 ô = 2.3845 4 
where d is the Durbin-Watson statistic, which will be discussed shortly. 
In¥;= 1.6067 + 0.6522 In X, 
se= (0.0547 0.0124 
( m ) (12.5.2) 


t = (29.3680) (52.7996) 
r? = 0.9845 d=0.2176  ĉ = 0.0221 


Since the above model is double-log, the slope coefficient represents elasticity. In the present case, we 
see that if labor productivity goes up by 1 percent, the average compensation goes up by about 0.65 percent. 

Qualitatively, both the models give similar results. In both cases the estimated coefficients are “highly” 
significant, as indicated by the high ż¢ values. In the linear model, if the index of productivity goes up by a 
unit, on average, the index of compensation goes up by about 0.67 units. In the log-linear model, the slope 
coefficient being elasticity (why?), we find that if the index of productivity goes up by 1 percent, on average, 
the index of real compensation goes up by about 0.65 percent. | 
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How reliable are the results given in Eqs. (12.5.1) and (12.5.2) if there is autocorrelation? As stated previ- 
ously, if there is autocorrelation, the estimated standard errors are biased, as a result of which the estimated 
t ratios are unreliable. We obviously need to find out if our data suffer from autocorrelation. In the following 
section we discuss several methods of detecting autocorrelation. We will illustrate these methods with the 
log-linear model (12.5.2). 


12.6 Detecting Autocorrelation 


l. Graphical Method 


Recall that the assumption of nonautocorrelation of the classical model relates to the population disturbances 
u,, which are not directly observable. What we have instead are their proxies, the residuals u,, Which can 
be obtained by the usual OLS procedure. Although the å, are not the same thing as u,,!’ very often a visual 
examination of the us gives us some clues about the likely presence of autocorrelation in the û’s. Actually, a 
visual examination of i, or (7) can provide useful information not only about autocorrelation but also about 
heteroscedasticity (as we saw in the preceding chapter), model inadequacy, or specification bias, as we shall 
see in the next chapter. As one author notes: 


The importance of producing and analyzing plots [of residuals] as a standard part of statistical analysis cannot 
be overemphasized. Besides occasionally providing an easy to understand summary of a complex problem, they 
allow the simultaneous examination of the data as an aggregate while clearly displaying the behavior of individual 


cases. !8 


There are various ways of examining the residuals. We can simply plot them against time, the time 
sequence plot, as we have done in Figure 12.8, which shows the residuals obtained from the log wages- 
productivity regression (12.5.2). The values of these residuals are given in Table 12.5 along with some other 
data. 

Alternatively, we can plot the standardized residuals against time, which are also shown in Figure 12.8 
and Table 12.5. The standardized residuals are simply the residuals (v,) divided by the standard error of 
the regression (62), that is, they are (û,/£). Notice that ô, and & are measured in the units in which the 
regressand Y is measured. The values of the standardized residuals will therefore be pure numbers (devoid of 
units of measurement) and can be compared with the standardized residuals of other regressions. Moreover, 
the standardized residuals, like u,, have zero mean (why?) and approximately unit variance.!? In large samples 
(ù,/6) is approximately normally distributed with zero mean and unit variance. For our example, o = 2.6755. 

Examining the time sequence plot given in Figure 12.8, we observe that both ú, and the standardized ú, 
exhibit a pattern observed in Figure 12.1d, suggesting that perhaps u, are not random. 


17Even if the disturbances u, are homoscedastic and uncorrelated, their estimators, the residuals, ú, are heteroscedastic and 
autocorrelated. On this, see C. S. Maddala, Introduction to Econometrics, 2d ed., Macmillan, New York, 1992, pp. 480-481. 
However, it can be shown that as the sample size increases indefinitely, the residuals tend to converge to their true values, 
the ut’s. On this see, E. Malinvaud, Statistical Methods of Econometrics, 2d ed., North-Holland Publishers, Amsterdam, 1970, 
p. 88. 

'8stanford Weisberg, Applied Linear Regression, John Wiley & Sons, New York, 1980, p. 120. 

"Actually, it is the so-called Studentized residuals that have a unit variance. But in practice the standardized residuals will 
give the same picture, and hence we may rely on them. On this, see Norman Draper and Harry Smith, Applied Regression 
Analysis, 3d ed., John Wiley & Sons, New York, 1998, pp. 207-208. 


454 


Basic Econometrics 


-8 
1960 


1965 


1970 


1975 


1980 


Year 


1985 


1990 


100*S1 


1995 


2000 


2005 


Figure 12.8 Residuals (magnified 100 times) and standardized residuals from the wages—productivity regression 


Table 12.5 Residuals: Actual, Standardized, and Lagged 


Obs. 


1960 
1961 
1962 
1963 
1964 
1965 
1966 
1967 
1968 
1969 
1970 
1971 
1972 
1973 
1974 
1975 
1976 
1977 
1978 
1979 
1980 
1981 
1982 


Notes: S1 = residuals from the wages-productivity regression (log form). 


(log form: model 12.5.2). 


$1 


-0.036068 
-0.030780 
-0.026724 
-0.029160 
-0.026246 
-0.028348 
-0.017504 
-0.006419 
0.007094 
0.018409 
0.024713 
0.016289 
0.025305 
0.025829 
0.023744 
0.011131 
0.018359 
0.020416 
0.030781 
0.033023 
0.031604 
0.020801 
0.038719 


SDRES 


-1.639433 
-1.399078 
-1.214729 
-1.325472 
= 1193017 
-1.288551 
-0.795647 
-0.291762 
0.322459 
0.836791 
1.123311 
0.740413 
1.150208 
1.174049 
1.079278 
0.505948 
0.834515 
0.927990 
1399185 
1.501051 
1.436543 
0.945516 
1.759960 


$1(-1) 


NA 
-0.036068 
-0.030780 
-0.026724 
-0.029160 
-0.026246 
-0.028348 
-0.017504 
-0.006419 

0.007094 
0.018409 
0.024713 
0.016289 
0.025305 
0.025829 
0.023744 
0.011131 
0.018359 
0.020416 
0.030781 
0.033023 
0.031604 
0.020801 


Obs. 


1983 
1984 
1985 
1986 
1987 
1988 
1989 
1990 
1991 
1992 
1993 
1994 
1995 
1996 
1997 
1998 
1999 
2000 
2001 
2002 
2003 
2004 
2005 


S1 


0.014416 
0.001774 
0.001620 
0.013471 
0.013725 
0.017232 
-0.004818 
-0.006232 
-0.004118 
-0.005078 
-0.010686 
-0.023553 
-0.027874 
-0.039805 
-0.041164 
-0.013576 
-0.006674 
0.010887 
0.007551 
0.000453 
-0.006673 
-0.015650 
-0.020198 


S1 (—1) = residuals lagged one period. 


SDRES = standardized residuals = residuals/standard error of estimate. 


SDRES 


0.655291 
0.080626 
0.073640 
0.612317 
0.623875 
0.783269 
-0.219005 
-0.283285 
-0.187161 
-0.230822 
-0.485739 
=1070578 
-1.266997 
-1.809304 
-1.871079 
-0.617112 
-0.303364 
0.494846 
0.343250 
0.020599 
-0.303298 
-0.711380 
-0.918070 


$1(-1) 


0.038719 
0.014416 
0.001774 
0.001620 
0.013471 
0.013725 
0.017232 
-0.004818 
-0.006232 
-0.004118 
-0.005078 
-0.010686 
-0.023553 
-0.027874 
-0.039805 
-0.041164 
-0.013576 
-0.006674 
0.010887 
0.007551 
0.000453 
-0.006673 
-0.015650 
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Figure 12.9 Current residuals versus lagged residuals. 


To see this differently, we can plot ù, against ù,—1, that is, plot the residuals at time ¢ against their value at 
time (t — 1), a kind of empirical test of the AR(1) scheme. If the residuals are nonrandom, we should obtain 
pictures similar to those shown in Figure 12.3. This plot for our log wages—productivity regression is as 
shown in Figure 12.9; the underlying data are given in Table 12.5. As this figure reveals, most of the residuals 
are bunched in the second (northeast) and the fourth (southwest) quadrants, suggesting a strong positive 
correlation in the residuals. 

The graphical method we have just discussed, although powerful and suggestive, is subjective or quali- 
tative in nature. But there are several quantitative tests that one can use to supplement the purely qualitative 
approach. We now consider some of these tests. 


ll. The Runs Test 


If we carefully examine Figure 12.8, we notice a peculiar feature: Initially, we have several residuals that 
are negative, then there is a series of positive residuals, and then there are several residuals that are negative. 
If these residuals were purely random, could we observe such a pattern? Intuitively, it seems unlikely. This 
intuition can be checked by the so-called runs test, sometimes also known as the Geary test, a nonparametric 
test.”? 

To explain the runs test, let us simply note down the signs (+ or —) of the residuals obtained from the 
wages-—productivity regression, which are given in the first column of Table 12.5. 


20in nonparametric tests we make no assumptions about the (probability) distribution from which the observations are 
drawn. On the Geary test, see R. C. Geary, “Relative Efficiency of Count Sign Changes for Assessing Residual Autoregression 
in Least Squares Regression,” Biometrika, vol. 57, 1970, pp. 123-127. 
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Thus there are 8 negative residuals, followed by 21 positive residuals, followed by 11 negative residuals, 
followed by 3 positive residuals, followed by 3 negative residuals, for a total of 46 observations. 

We now define a run as an uninterrupted sequence of one symbol or attribute, such as + or —. We further 
define the length of a run as the number of elements in it. In the sequence shown in Eq. (12.6.1), there are 
5 runs: a run of 8 minuses (i.e., of length 8), a run of 21 pluses (i.e., of length 21), a run of 11 minuses (i.e., 
of length 11), a run of 3 pluses (i.e., of length 3), and a run of 3 minuses (i.e., of length 3). For a better visual 
effect, we have presented the various runs in parentheses. 

By examining how runs behave in a strictly random sequence of observations, one can derive a test of 
randomness of runs. We ask this question: Are the 5 runs observed in our illustrative example consisting of 46 
observations too many or too few compared with the number of runs expected in a strictly random sequence 
of 46 observations? If there are too many runs, it would mean that in our example the residuals change sign 
frequently, thus indicating negative serial correlation (cf. Figure 12.3b). Similarly, if there are too few runs, 
they may suggest positive autocorrelation, as in Figure 12.3a. A priori, then, Figure 12.8 would indicate 
positive correlation in the residuals. 

Now let 


N = total number of observations = N, + M2 
N; = number of + symbols (i.e., + residuals) 
N = number of — symbols (i.e., — residuals) 

R = number of runs 


Then under the null hypothesis that the successive outcomes (here, residuals) are independent, and 
assuming that N} > 10 and N, > 10, the number of runs is (asymptotically) normally distributed with 


2N,N: 
Mean: E(R) = n 2 aai 
A 2  2NiN2(2N\N2 — N) (12.6.2) 
Variance: m = > 
ODEN al 


Note: N= N, + N}. 
If the null hypothesis of randomness is sustainable, following the properties of the normal distribution, we 
should expect that bd 


Prob [E(R) — 1.960r < R < E(R) + 1.960R] = 0.95 (12.6.3) 
That is, the probability is 95 percent that the preceding interval will include R. Therefore we have this rule: 


Decision Rule Do not reject the null hypothesis of randomness with 95% confidence if R, the number of 
runs, lies in the preceding confidence interval; reject the null hypothesis if the estimated R lies outside these 
limits. (Note: You can choose any level of confidence you want.) 


Returning to our example, we know that N}, the number of pluses, is 24 and N,, the number of minuses, is 22 
and R = 5. Using the formulas given in Eq. (12.6.2), we obtain: 


E(R) = 24 


oR = 11 (12.6.4) 
opie 
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The 95% confidence interval for R in our example is thus: 


[24 + 1.96(3.32)] = (17.5, 30.5) 


Obviously, this interval does not include 5. Hence, we can reject the hypothesis that the residuals in our 
wages—productivity regression are random with 95% confidence. In other words, the residuals exhibit 
autocorrelation. As a general rule, if there is positive autocorrelation, the number of runs will be few, whereas 
if there is negative autocorrelation, the number of runs will be many. Of course, from Eq. (12.6.2) we can find 
out whether we have too many runs or too few runs. 

Swed and Eisenhart have developed special tables that give critical values of the runs expected in a random 
sequence of N observations if N, or N, is smaller than 20. These tables are given in Appendix D, Table D.6. 
Using these tables, the reader can verify that the residuals in our wages—productivity regression are indeed 
nonrandom; actually they are positively correlated. 


ill. Durbin—Watson d Test?! 


The most celebrated test for detecting serial correlation is that developed by statisticians Durbin and Watson. 
It is popularly known as the Durbin-Watson d statistic, which is defined as 


= A a 

d Pe: pap (ler Par. ü) 

ds r 
t=] t 


(12.6.5) 


which is simply the ratio of the sum of squared differences in successive residuals to the RSS. Note that in 
the numerator of the d statistic the number of observations is n — 1 because one observation is lost in taking 
successive differences. 

A great advantage of the d statistic is that it is based on the estimated residuals, which are routinely 
computed in regression analysis. Because of this advantage, it is now a common practice to report the Durbin- 
Watson d along with summary measures, such as R?, adjusted R?, t, and F. Although it is now routinely used, 
it is important to note the assumptions underlying the d statistic. 

1. The regression model includes the intercept term. If it is not present, as in the case of the regression 
through the origin, it is essential to rerun the regression including the intercept term to obtain the RSS.” 

2. The explanatory variables, the X’s, are nonstochastic, or fixed in repeated sampling. 

3. The disturbances u, are generated by the first-order autoregressive scheme: u, = pu,_; + €,. Therefore, it 
cannot be used to detect higher-order autoregressive schemes. 

4. The error term u, is assumed to be normally distributed. 

5. The regression model does not include the lagged value(s) of the dependent variable as one of the 
explanatory variables. Thus, the test is inapplicable in models of the following type: 


Y, = Bi + BoXa + b3 Xt +--+ + BeXee ty Nai +u; (12.6.6) 


where Y,_, is the one period lagged value of Y. Such models are known as autoregressive models, which we 
will study in Chapter 17. 


21). Durbin and G. S. Watson, “Testing for Serial Correlation in Least-Squares Regression,” Biometrika, vol. 38, 1951, 
pp. 159-1 71. 

22However, R. W. Farebrother has calculated d values when the intercept term is absent from the model. See his “The 
Durbin-Watson Test for Serial Correlation When There Is No Intercept in the Regression,” Econometrica, vol. 48, 1980, 
pp. 1553-1563. 
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6. There are no missing observations in the data. Thus, in our wages—productivity regression for the period 
1960-2005, if observations for, say, 1978 and 1982 were missing for some reason, the d statistic would make 
no allowance for such missing observations.” 

The exact sampling or probability distribution of the d statistic given in Eq. (12.6.5) is difficult to derive 
because, as Durbin and Watson have shown, it depends in a complicated way on the X values present in a 
given sample.” This difficulty should be understandable because d is computed from ĉ,, which are, of course, 
dependent on the given X’s. Therefore, unlike the t, Æ or x° tests, there is no unique critical value that will 
lead to the rejection or the acceptance of the null hypothesis that there is no first-order serial correlation in 
the disturbances u;. However, Durbin and Watson were successful in deriving a lower bound d, and an upper 
bound dy such that if the computed d from Eq. (12.6.5) lies outside these critical values, a decision can be 
made regarding the presence of positive or negative serial correlation. Moreover, these limits depend only 
on the number of observations n and the number of explanatory variables and do not depend on the values 
taken by these explanatory variables. These limits, for n going from 6 to 200 and up to 20 explanatory 
variables, have been tabulated by Durbin and Watson and are reproduced in Appendix D, Table D.5 (up to 
20 explanatory variables). 

The actual test procedure can be explained better with the aid of Figure 12.10, which shows that the limits 
of d are 0 and 4. These can be established as follows. Expand Eq. (12.6.5) to obtain 


Pp Da 2D is 
Di 


Since )° ú? and } ú? differ in only one observation, they are approximately equal. Therefore, setting 
> a? ~ + a?, Eq. (12.6.7) may be written as 


(12.6.7) 


>D Uy i t—1 i ` 
d~2 (1 — SEa) (12.6.8) 
du; 
where © means approximately. 

| | | | | 

pre 
Reject Ho | Zone of | Zone of | Reject Ho 
Evidence of | inde- {inde- | Evidence of 


l 
l 
e | vs | 
positive | goon | | cision | negative |! v 
auto- | | auto- i 
correlation | | | | correlation | 
l Bas o l | 
| | Do not reject Ho or He ! | | 
| or both | | 
| | | l l 
d 
0 d; dy 2 4- dy 4- d; 4 
Legend 


Họ: No positive autocorrelation 
Ho: No negative autocorrelation 
Figure 12.10 Durbin—Watson d statistic. 


23For further details, see Gabor Korosi, Laszlo Matyas, and Istvan P. Szekey, Practical Econometrics, Avebury Press, England 
1992, pp. 88-89. 


?4But see the discussion on the “exact” Durbin-Watson test given later in the section. 
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Now let us define 


y Üü, 
La; 
as the sample first-order coefficient of autocorrelation, an estimator of p. (See footnote 9.) Using Eq. (12.6.9), 

we can express Eq. (12.6.8) as 


p= (12.6.9) 


d ~2(1 — p) (12.6.10) 
But since -1 = p = 1, Eq. (12.6.10) implies that 
0<d<4 - (12.6.11) 
These are the bounds of d; any estimated d value must lie within these limits. 

It is apparent from Eq. (12.6.10) that if 6 = 0, d = 2; that is, if there is no serial correlation (of the first- 
order), d is expected to be about 2. Therefore, as a rule of thumb, if d is found to be 2 in an application, one 
may assume that there is no first-order autocorrelation, either positive or negative. If 6 = +1, indicating 
perfect positive correlation in the residuals, d ~ 0. Therefore, the closer d is to 0, the greater the evidence of 
positive serial correlation. This relationship should be evident from Eq. (12.6.5) because if there is positive 
autocorrelation, the ů,’s will be bunched together and their differences will therefore tend to be small. As a 
result, the numerator sum of squares will be smaller in comparison with the denominator sum of squares, 
which remains a unique value for any given regression. 

If 6 = —1, that is, there is perfect negative correlation among successive residuals, d ~ 4. Hence, the 
closer d is to 4, the greater the evidence of negative serial correlation. Again, looking at Eq. (12.6.5), this is 
understandable. For if there is negative autocorrelation, a positive &, will tend to be followed by a negative 
u, and vice versa so that |”, — û,—ı| will usually be greater than |ù,|. Therefore, the numerator of d will be 
comparatively larger than the denominator. 

The mechanics of the Durbin—Watson test are as follows, assuming that the assumptions underlying the 
test are fulfilled: 


1. Run the OLS regression and obtain the residuals. 

2. Compute d from Eq. (12.6.5). (Most computer programs now do this routinely.) 

3. For the given sample size and given number of explanatory variables, find out the critical d, and dy values. 

4. Now follow the decision rules given in Table 12.6. For ease of reference, these decision rules are also 
depicted in Figure 12.10. 


To illustrate the mechanics, let us return to our wages—productivity regression. From the data given in 
Table 12.5 the estimated d value can be shown to be 0.2175, suggesting that there is positive serial corre- 
lation in the residuals. From the Durbin—Watson tables, we find that for 46 observations and one explanatory 
variable, d, = 1.475 and dy = 1.566 at the 5 percent level. Since the computed d of 0.2175 lies below d;, we 
cannot reject the hypothesis that there is positive serial correlation in the residuals. 


Table 12.6 Durbin—Watson d Test: Decision Rules 


Null Hypothesis Decision l if 

No positive autocorrelation Reject O0<d<d, 

No positive autocorrelation No decision a<d<dy 

No negative correlation Reject 4-—-d<d<4 

No negative correlation No decision 4-dy<d<4-d, 
No autocorrelation, positive or negative Do not reject du<d<4-dy 


ee nae 
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Although extremely popular, the d test has one great drawback in that, if it falls in the indecisive zone, 
one cannot conclude that (first-order) autocorrelation does or does not exist. To solve this problem, several 
authors have proposed modifications of the d test but they are rather involved and beyond the scope of this 
book.” In many situations, however, it has been found that the upper limit dy is approximately the true 
significance limit and therefore in case d lies in the indecisive zone, one can use the following modified d 
test: Given the level of significance a, 


1. Ho: p =0 versus H}: p > 0. Reject Hy at a level if d <dy. That is, there is statistically significant positive 
autocorrelation. 

2. Ho: p =0 versus H): p < 0. Reject Hoat a level if the estimated (4 — d) < dy, that is, there is statistically 
significant evidence of negative autocorrelation. 

3. Hp: p = 0 versus H,: p # 0. Reject Hy at 2a level if d < dy or (4 — d) < dy, that is, there is statistically 
significant evidence of autocorrelation, positive or negative. 


It may be pointed out that the indecisive zone narrows as the sample size increases, which can be seen 
clearly from the Durbin—Watson tables. For example, with 4 regressors and 20 observations, the 5 percent 
lower and upper d values are 0.894 and 1.828, respectively, but these values are 1.515 and 1.739 if the sample 
size is 75. 

The computer program SHAZAM performs an exact d test, that is, it gives the p value, the exact proba- 
bility of the computed d value. With modern computing facilities, it is no longer difficult to find the p value 
of the computed d statistic. Using SHAZAM (version 9) for our wages—productivity regression, we find the 
p value of the computed d of 0.2176 is practically zero, thereby reconfirming our earlier conclusion based on 
the Durbin—Watson tables. 

The Durbin—Watson d test has become so venerable that practitioners often forget the assumptions under- 
lying the test. In particular, the assumptions that (1) the explanatory variables, or regressors, are nonstochastic; 
(2) the error term follows the normal distribution; (3) the regression models do not include the lagged value(s) 
of the regress and; and (4) only the first-order serial correlation is taken into account are very important for 
the application of the d test. It should also be added that a significant d statistic may not necessarily indicate 
autocorrelation. Rather, it may be an indication of omission of relevant variables from the model. 

If a regression model contains lagged value(s) of the regressand, the d value in such cases is often around 
2, which would suggest that there is no (first-order) autocorrelation in such models. Thus, there is a built-in 
bias against discovering (first-order) autocorrelation in such models. This does not mean that autoregressive 
models do not suffer from the autocorrelation problem. As a matter of fact, Durbin has developed the so-called 
h test to test serial correlation in such models. But this test is not as powerful, in a statistical sense, as the 
Breusch—Godfrey test to be discussed shortly, so there is no need to use the h test. However, because of its 
historical importance, it is discussed in Exercise 12.36. 

Also, if the error term u, are not NIID, the routinely used d test may not be reliable.*° In this respect 
the runs test discussed earlier has an advantage in that it does not make any (probability) distributional 
assumption about the error term. However, if the sample is large (technically infinite), we can use the Durbin— 
Watson d, for it can be shown that?’ 


1 
Yn (: = 54) ~ N(0, 1) (12.6.12) 


25For details, see Thomas B. Fomby, R. Carter Hill, and Stanley R. johnson, Advanced Econometric Methods, Springer Verlag, 
New York, 1984, pp. 225-228. 


26For an advanced discussion, see Ron C. Mittelhammer, George G. Judge, and Douglas J. Miller, Econometric Foundations, 
Cambridge University Press, New York, 2000, p. 550. 


27See James Davidson, Econometric Theory, Blackwell Publishers, New York, 2000, p. 161. 
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That is, in large samples the d statistic as transformed in Eq. (12.6.12) follows the standard normal distri- 
bution. Incidentally, in view of the relationship between d and ĝ, the estimated first-order autocorrelation 
coefficient, shown in Eq. (12.6.10), it follows that 


Jnp © N(0, 1) (12.6.13) 


that is, in large samples, the square root of the sample size times the estimated first-order autocorrelation 
coefficient also follows the standard normal distribution. 

As an illustration of the test, for our wages—productivity example, we found that d = 0.2176 with n = 46. 
Therefore, from Eq. (12.6.12) we find that 


0.2176 
vi6(1 ae ) z~ 6.0447 


Asymptotically, if the null hypothesis of zero (first-order) autocorrelation were true, the probability of 
obtaining a Z value (i.e., a standardized normal variable) of as much as 6.0447 or greater is extremely small. 
Recall that for a standard norma! distribution, the (two-tail) critical 5 percent Z value is only 1.96 and the 1 
percent critical Z value is about 2.58. Although our sample size is only 46, for practical purposes it may be 
large enough to use the normal approximation. The conclusion remains the same, namely, that the residuals 
from the wages—productivity regression suffer from autocorrelation. 

But the most serious problem with the d test is the assumption that the regressors are nonstochastic, that is, 
their values are fixed in repeated sampling. If this is not the case, then the d test is not valid either in finite, or 
small, samples or in large samples.” And since this assumption is usually difficult to maintain in economic 
models involving time series data, one author contends that the Durbin—Watson statistic may not be useful in 
econometrics involving time series data.” In his view, more useful tests of autocorrelation are available, but 
they are all based on large samples. We discuss one such test below, the Breusch—Godfrey test. 


IV. A General Test of Autocorrelation: The Breusch-Godfrey (BG) Test?’ 


To avoid some of the pitfalls of the Durbin—Watson d test of autocorrelation, statisticians Breusch and 
Godfrey have developed a test of autocorrelation that is general in the sense that it allows for (1) nonsto- 
chastic regressors, such as the lagged values of the regressand; (2) higher-order autoregressive schemes, such 
as AR(1), AR(2), etc.; and (3) simple or higher-order moving averages of white noise error terms, such as €, 
in Eq. (12.2.1).?! 

Without going into the mathematical details, which can be obtained from the references, the BG test, 
which is also known as the LM test,’ proceeds as follows: We use the two-variable regression model to 
illustrate the test, although many regressors can be added to the model. Also, lagged values of the regressand 
can be added to the model. Let 


Y, = Bi + BoX; + uy (12.6.14) 


8ibid., p. 161. 

29Fumio Hayashi, Econometrics, Princeton University Press, Princeton, NJ, 2000, p. 45. 

30See, L. G. Godfrey, “Testing Against General Autoregressive and Moving Average Error Models When the Regressor 
Includes Lagged Dependent Variables,” Econometrica, vol. 46, 1978, pp. 1293-1302, and T. S. Breusch, “Testing for 
Autocorrelation in Dynamic Linear Models,” Australian Economic Papers, vol. 17, 1978, pp. 334-355. 

31For example, in the regression Y, = B, + Bz X; + u, the error term can be represented as u; = £; + Aye__1 + Azer-2, which 
represents a three-period moving average of the white noise error term £, 

32The test is based on the Lagrange multiplier principle briefly mentioned in Chapter 8. 
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Assume that the error term u, follows the pth-order autoregressive, AR(p), scheme as follows: 
Uy = play F Pee ee (12.6.15) 


where ¢, is a white noise error term as discussed previously. As you will recognize, this is simply the extension 
of the AR(1) scheme. 
The null hypothesis Hp to be tested is that 


Ho: P= M=" =Pp= 0 (12.6.16) 


That is, there is no serial correlation of any order. The BG test involves the following steps: 

1. Estimate Eq. (12.6.14) by OLS and obtain the residuals, &;. 

2. Regress û, on the original X, (if there is more than one X variable in the original model, include them 
also) and w;_1, U;_2,..., Uy p» where the latter are the lagged values of the estimated residuals in step 1. 
Thus, if p = 4, we will introduce four lagged values of the residuals as additional regressors in the model. 
Note that to run this regression we will have only (n — p) observations (why?). In short, run the following 
regression: 


ii, = of + 0X, + Îiût-1 + foty_2 +--+ + Ppitr_p + & (12.6.17) 


and obtain R? from this (auxiliary) regression.’ 


3. If the sample size is large (technically, infinite), Breusch and Godfrey have shown that 
(n= p)R’ ~x (12.6.18) 


That is, asymptotically, n — p times the R? value obtained from the auxiliary regression (12.6.17) follows 
the chi-square distribution with p df. If in an application, (n — p)R? exceeds the critical chi-square value at 
the chosen level of significance, we reject the null hypothesis, in which case at least one p in Eq. (12.6.15) is 
statistically significantly different from zero. 

The following practical points about the BG test may be noted: 

1. The regressors included in the regression model may contain lagged values of the regressand Y, that 
is, Y,_,, Y,-2, etc., may appear as explanatory variables. Contrast this model with the Durbin—Watson test 
restriction that there may be no lagged values of the regressand among the regressors. 

2. As noted earlier, the BG test is applicable even if the disturbances follow a pth-order moving average 
(MA) process, that is, the u,are generated as follows: 


OS = o ar Aera +Àz2&r2 +-+ ApEt—p (12.6.19) 


where €, is a white noise error term, that is, the error term that satisfies all the classical assumptions. 

In the chapters on time series econometrics, we will study in some detail the pth-order autoregressive and 
moving average processes. 

3. If in Eq. (12.6.15) p = 1, meaning first-order autoregression, then the BG test is known as Durbin’s M 
test. 

4. A drawback of the BG test is that the value of p, the length of the lag, cannot be specified a priori. Some 
experimentation with the p value is inevitable. Sometimes one can use the so-called Akaike and Schwarz 
information criteria to select the lag length. We will discuss these criteria in Chapter 13 and later in the 
chapters on time series econometrics. 

5. Given the values of the X variable(s) and the lagged values of u, the test assumes that the variance of u 
in Eq. (12.6.15) is homoscedastic. 


33The reason that the original regressor X is included in the model is to allow for the fact that X may not be strictly nonsto- 
chastic. But if it is strictly nonstochastic, it may be omitted from the model. On this, see Jeffrey M. Wooldridge, Introductory 
Econometrics: A Modern Approach, South-Western Publishing Co., 2003, p. 386. 
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Illustration of the BG Test: The Wages—Productivity Relation 


To illustrate the test, we will apply it to our illustrative example. Using an AR(6) scheme, we obtained the 
results shown in Exercise 12.25. From the regression results given there, it can be seen that (n — p) = 40 and 
R? = 0.7498. Therefore, multiplying these two, we obtain a chi-square value of 29.992. For 6 df (why?), the 
probability of obtaining a chi-square value of as much as 29.992 or greater is extremely small; the chi-square 
table in Appendix D.4 shows that the probability of obtaining a chi-square value of as much as 18.5476 or 
greater is only 0.005. Therefore, for the same df, the probability of obtaining a chi-square value of about 30 
must be extremely small. As a matter of fact, the actual p value is almost zero. 

Therefore, the conclusion is that, for our example, at least one of the six autocorrelations must be nonzero. 

Trying varying lag lengths from 1 to 6, we find that only the AR(1) coefficient is significant, suggesting that 
there is no need to consider more than one lag. In essence the BG test in this case turns out to be Durbin’s 
m test. 


Why so Many Tests of Autocorrelation? 


The answer to this question is that “. . . no particular test has yet been judged to be unequivocally best [1.e., 
more powerful in the statistical sense], and thus the analyst is still in the unenviable position of considering a 
varied collection of test procedures for detecting the presence or structure, or both, of autocorrelation.” *4 Of 
course, a similar argument can be made about the various tests of heteroscedasticity discussed in the previous 
chapter. 


12.7 What to do when you find Autocorrelation: Remedial Measures 


If after applying one or more of the diagnostic tests of autocorrelation discussed in the previous section, we 
find that there is autocorrelation, what then? We have four options: 

1. Try to find out if the autocorrelation is pure autocorrelation and not the result of mis-specification of 
the model. As we discussed in Section 12.1, sometimes we observe patterns in residuals because the model is 
mis-specified—that is, it has excluded some important variables—or because its functional form is incorrect. 

2. If it is pure autocorrelation, one can use appropriate transformation of the original model so that in the 
transformed model we do not have the problem of (pure) autocorrelation. As in the case of heteroscedasticity, 
we will have to use some type of generalized least-square (GLS) method. 

3. In large samples, we can use the Newey—West method to obtain standard errors of OLS estimators 
that are corrected for autocorrelation. This method is actually an extension of White's heteroscedasticity- 
consistent standard errors method that we discussed in the previous chapter. 

4. In some situations we can continue to use the OLS method. 

Because of the importance of each of these topics, we devote a section to each one. 


12.8 Model Mis-Specification versus Pure Autocorrelation 


Let us return to our wages—productivity regression given in Eq. (12.5.2). There we saw that the d value 
was 0.2176 and based on the Durbin—Watson d test we concluded that there was positive correlation in the 


34Ron C. Mittelhammer et al., op. cit., p. 547. Recall that the power of a statistical test is 1 minus the probability of 
committing a Type Il error, that is, 1 minus the probability of accepting a false hypothesis. The maximum power of a test is 
1 and the minimum is 0. The closer the power of a test is to zero, the worse is that test, and the closer it is to 1, the more 
powerful is that test. What these authors are essentially saying is that there is no single most powerful test of autocorrelation. 
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error term. Could this correlation have arisen because our model was not correctly specified? Since the 
data underlying regression (12.5.1) is time series data, it is quite possible that both wages and productivity 
exhibit trends. If that is the case, then we need to include the time or trend, f, variable in the model to see the 
relationship between wages and productivity net of the trends in the two variables. 

To test this, we included the trend variable in Eq. (12.5.2) and obtained the following results 


Î,= 0.1209 + 1.0283X,— 0.0075¢ 
se = (0.3070) (0.0776) (0.0015) 
= (0.3939) (13.2594)  (—4.8903) 

l R? = 0.9900; d= 0.4497 


The interpretation of this model is straightforward: Over time, the index of real wages has been decreasing by 
about 0.75 units per year. After allowing for this, if the productivity index went up by one unit, on average, 
the real compensation went up by about one unit. What is interesting to note is that even allowing for the 
trend variable, the d value is still very low, suggesting that Eq. (12.8.1) suffers from pure autocorrelation and 
not necessarily specification error. 

How do we know that Eq. (12.8.1) is the correct specification? To test this, we regress Y on X and X’ to test 
for the possibility that the real wage index may be nonlinearly related to the productivity index. The results 
of this regression are as follows: 


f = —1.7843 + 2.1963X,— 0.1752X? 
t = (—2.7713) (7.5040) n= 52785) (12.8.2) 


R? = 0.9906 d= 0.3561 


We leave it to the reader to interpret these results. For the present purposes, look at the Durbin—Watson, which 
is still quite low, suggesting that we still have positive serial correlation in the residuals. 

It may be safe to conclude from the preceding analysis that our wages—productivity regression probably 
suffers from pure autocorrelation and not necessarily from specification bias. Knowing the consequences of 
autocorrelation, we may therefore want to take some corrective action. We will do so shortly. 

Incidentally, for all the wages—productivity regressions that we have presented above, we applied the 
Jarque—Bera test of normality and found that the residuals were normally distributed, which is comforting 
because the d test assumes normality of the error term. 


(12.8.1) 


wv 


12.9 Correcting for (Pure) Autocorrelation: The Method of Generalized 
Least Squares (GLS) 


Knowing the consequences of autocorrelation, especially the lack of efficiency of OLS estimators, we may 
need to remedy the problem. The remedy depends on the knowledge one has about the nature of interdepen- 
dence among the disturbances, that is, knowledge about the structure of autocorrelation. 
As a starter, consider the two-variable regression model: 
Y, = Bi + BX; + uy (12.9.1) 
and assume that the error term follows the AR(1) scheme, namely, 


Uy = puri +e, —-l<p<1 - (12.9.2) 
Now we consider two cases: (1) p is known and (2) p is not known but has to be estimated. 
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When p is Known 


If the coefficient of first-order autocorrelation is known, the problem of autocorrelation can be easily solved. 
If Eq. (12.9.1) holds true at time 1, it also holds true at time (t — 1). Hence, 


Y,-1 = By + BoXy-1 + -1 (12.9.3) 
Multiplying Eq. (12.9.3) by p on both sides, we obtain 
PY,-1 = pı + pB2Xi-1 + put- (12.9.4) 
Subtracting Eq. (12.9.4) from Eq. (12.9.1) gives 
(Yr — pPYi-1) = Bi(l — p) + B(X: — pXr-1) + Er (12.9.5) 
where €,= (u,— pu,_;) 
We can express Eq. (12.9.5) as 
Yf = Bi + BOX; + & (12.9.6) 


where Br = Bi(1 — p), Yf = (Y; — pYr-1), Xf = (X: — pX1-1), and BF = po. 

Since the error term in Eq. (12.9.6) satisfies the usual OLS assumptions, we can apply OLS to the trans- 
formed variables Y’ and X” and obtain estimators with all the optimum properties, namely. BLUE. In effect, 
running Eq. (12.9.6) is tantamount to using generalized least squares (GLS) discussed in the previous 
chapter—recall that GLS is nothing but OLS applied to the transformed model that satisfies the classical 
assumptions. 

Regression (12.9.5) is known as the generalized, or quasi, difference equation. It involves regressing Y 
on X, not in the original form, but in the difference form, which is obtained by subtracting a proportion ( = p) 
of the value of a variable in the previous time period from its value in the current time period. In this differ- 
encing procedure we lose one observation because the first observation has no antecedent. To avoid this loss 
of one observation, the first observation on Y and X is transformed as follows:*° Y, /1 — p? and Xi y1 — p?. 
This transformation is known as the Prais—Winsten transformation. 


When p is not Known 


Although conceptually straightforward to apply, the method of generalized difference given in Eq. (12.9.5) ts 
difficult to implement because p is rarely known in practice. Therefore, we need to find ways of estimating 
p. We have several possibilities. 


The First-Difference Method 


Since p lies between 0 and +1, one could start from two extreme positions. At one extreme, one could assume 
that p = 0, that is, no (first-order) serial correlation, and at the other extreme we could let p = +1, that is, 
perfect positive or negative correlation. As a matter of fact, when a regression is run, one generally assumes 
that there is no autocorrelation and then lets the Durbin—Watson or other test show whether this assumption 


35The loss of one observation may not be very serious in large samples but can make a substantial difference in the results 
in small samples. Without transforming the first observation as indicated, the error variance will not be homoscedastic. On 
this, see Jeffrey Wooldridge, op. cit., p. 388. For some Monte Carlo results on the importance of the first observation, see 
Russell Davidson and James G. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 


1993, Table 10.1, p. 349. 
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is justified. If, however, p = +1, the generalized difference equation (12.9.5) reduces to the first-difference 
equation: 
Y, — Y,4°= BA V9) Pe, a 


or 
AY; = B2AX;ı SF Et (12.9.7) 


where A is the first-difference operator introduced in Eq. (12.1.10). 

Since the error term in Eq. (12.9.7) is free from (first-order) serial correlation (why?), to run the regression 
(12.9.7) all one has to do is form the first differences of both the regressand and regressor(s) and run the 
regression on these first differences. 

The first-difference transformation may be appropriate if the coefficient of autocorrelation is very high, 
say in excess of 0.8, or the Durbin—Watson d is quite low. Maddala has proposed this rough rule of thumb: Use 
the first-difference form whenever d < R°.” This is the case in our wages—productivity regression (12.5.2). 
where we found that d = 0.2176 and r = 0.9845. The first-difference regression for our illustrative example 
will be presented shortly. 

An interesting feature of the first-difference model (12.9.7) is that there is no intercept in it. Hence, to 
estimate Eq. (12.9.7), you have to use the regression through the origin routine (that is, suppress the intercept 
term), which is now available in most software packages. If, however, you forget to drop the intercept term in 
the model and estimate the following model that includes the intercept term 


AY, = By + BAX; + & (12.9.8) 


then the original model must have a trend in it and £, represents the coefficient of the trend variable.*’ 
Therefore, one “accidental” benefit of introducing the intercept term in the first-difference model is to test for 
the presence of a trend variable in the original model. : ` 

Returning to our wages-productivity regression (12.5.2), and given the AR(1) scheme and a low d value 
in relation to r*, we rerun Eq. (12.5.2) in the first-difference form without the intercept term; remember that 
Eq. (12.5.2) is in the level form. The results are as follows:*® 


AY, = 0.6539AX, 
t = (11.4042) r2=0.4264 d= 1.7442 


Compared with the level form regression (12.5.2), we see that the slope coefficient has not changed much, 
but the 7? value has dropped considerably. This is generally the case because by taking thefirst differences 
we are essentially studying the behavior of variables around their (linear) trend values. Of course, we cannot 
compare the 7” of Eq. (12.9.9) directly with that of the 1° of Eq. (12.5.2) because the dependent variables in the 
two models are different.” Also, notice that compared with the original regression, the d value has increased 
dramatically, perhaps indicating that there is little autocorrelation in the first-difference regression.” 


(12.9.9) 


36Maddala, op. cit., p. 232. 


*’This is easy to show. Let Y, = a, + Byt + BX, + u,. Therefore, Y, =a +B, (t- 1) + BoX, 4 + u. Subtracting the latter from 
the former, you will obtain: AY, = 8, + BAX, + e, which shows that the intercept term in this equation is indeed the coef- 
ficient of the trend variable in the original model. Remember that we are assuming that p = 1. 

*8In Exercise 12.38 you are asked to run this model, including the constant term. 


*°The comparison of 7 in the level and first-difference form is slightly involved. For an extended discussion on this, see 
Maddala, op. cit., Chapter 6. 


“lt is not clear whether the computed d in the first-difference regression can be interpreted in the same way as it was in the 


original, level form regression. However, applying the runs test, it can be seen that there is no evidence of autocorrelation 
in the residuals of the first-difference regression. 
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Another interesting aspect of the first-difference transformation relates to the stationarity properties of the 
underlying time series. Return to Eq. (12.2.1), which describes the AR(1) scheme. Now if in fact p = 1, then 
it is clear from Eqs. (12.2.3) and (12.2.4) that the series u, is nonstationary, for the variances and covariances 
become infinite. That is why, when we discussed this topic. we put the restriction that [pl < 1. But it is clear 
from Eq. (12.2.1) that if the autocorrelation coefficient is in fact 1, then Eq. (12.2.1) becomes 


Uy = Ut + Er 


or 


(ur — Uy_-)) = Au, = & (12.9.10) 


That is, it is the first-differenced u, that becomes stationary, for it is equal to e, which is a white noise error term. 

The point of the preceding discussion is that if the original time series are nonstationary, very often their 
first differences become stationary. And. therefore, first-difference transformation serves a dual purpose in 
that it might get rid of (first-order) autocorrelation and also render the time series stationary. We will revisit 
this topic in Part 5, where we discuss the econometrics of time series analysis in some depth. 

We mentioned that the first-difference transformation may be appropriate if p is high or d is low. Strictly 
speaking. the first-difference transformation is valid only if p = 1. As a matter of fact, there is a test, called the 
Berenblutt-Webb test,’ to test the hypothesis that p = 1. The test statistic they use is called the g statistic, 
which is defined as follows: 

DDL 
E= nm (12.9.11) 


1 Ht 


where “; are the OLS residuals from the original (i.e., level form) regression and e, are the OLS residuals 
from the first-difference regression. Keep in mind that in the first-difference form there is no intercept. 

To test the significance of the g statistic, assuming that the level form regression contains the intercept 
term, we can use the Durbin—Watson tables except that now the null hypothesis is that p = 1 rather than the 
Durbin—Watson hypothesis that p = 0. 

Revisiting our wages—productivity regression, for the original regression (12.5.2) we obtain >> u- = 0.0214 
and )_ ê? = 0.0046. Putting these values into the g statistic given in Eq. (12.9.11), we obtain 

0.0046 
2 mpor 

Consulting the Durbin—Watson table for 45 observations (the number closest to 45 observations) and | 
explanatory variable (Appendix D, Table D.5), we find that d, = 1.288 and d, = 1.376 (5 percent level). Since 
the observed g lies below the lower limit of d, we do not reject the hypothesis that true p = 1. Keep in mind 
that although we use the same Durbin—Watson tables, now the null hypothesis is that p = 1 and not that p = 
0. In view of this finding, the results given in Eq. (12.9.9) may be acceptable. 


= 0.2149 (12.9.12) 


p Based on Durbin—Watson d Statistic 


If we cannot use the first-difference transformation because p is not sufficiently close to unity, we have an 
easy method of estimating it from the relationship between d and p established previously in Eq. Carne. 
from which we can estimate p as follows: 


(12.9.13) 


ones oe 
aida aa 


41) 1. Berenblutt and G. |. Webb, “A New Test for Autocorrelated Errors in the Linear Regression Model,” Journal of the Royal 
Statistical Society, Series B, vol. 35, no.1, 1973, pp. 33-50. 
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Thus, in reasonably large samples one can obtain p from Eq. (12.9.13) and use it to transform the data as 
shown in the generalized difference equation (12.9.5). Keep in mind that the relationship between p and d 
given in Eq. (12.9.13) may not hold true in small samples, for which Theil and Nagar have proposed a modifi- 
cation, which is given in Exercise 12.6. 

In our wages—productivity regression (12.5.2), we obtain a d value of 0.2176. Using this value in 
Eq. (12.9.13), we obtain ô + 0.8912. Using this estimated p value, we can estimate regression (12.9.5). All 
we have to do is subtract 0.8912 times the previous value of Y from its current value and similarly subtract 
0.8912 times the previous value of X from its current value and run the OLS regression on the variables thus 
transformed as in Eq. (12.9.6), where Y* = (Y, — 0.8912Y;_1) and X acO, TOONA |). 


p Estimated from the Residuals 


If the AR(1) scheme u, = pu, + €, is valid, a simple way to estimate p is to regress the residuals “, on iy, 
for the i, are consistent estimators of the true u, as noted previously. That is, we run the following regression: 


it =p. Ur—-1 + Vy (12.9.14) 


where û, are the residuals obtained from the original (level form) regression and where v, are the error term 
of this regression. Note that there is no need to introduce the intercept term in Eq. (12.9.14), for we know the 
OLS residuals sum to zero. 

The residuals from our wages—productivity regression given in Eq. (12.5.1) are already shown in Table 
12.5. Using these residuals, the following regression results were obtained: 


i, = 0.8678%i,_) 


(12.9.15) 
t= (12:7359) r? = 0.7863 


As this regression shows, ô = 0.8678. Using this estimate, one can transform the original model as per Eq. 
(12.9.6). Since the p estimated by this procedure is about the same as that obtained from the Durbin—Watson 
d, the regression results using the p of Eq. (12.9.15) should not be very different from those obtained from 
the p estimated from the Durbin—Watson d. We leave it to the reader to verify this. 


Iterative Methods of Estimating p 


All the methods of estimating p discussed previously provide us with only a single estimate of p. But there 
are the so-called iterative methods that estimate p iteratively, that is, by successive approximation, starting 
with some initial value of p. Among these methods the following may be mentioned: the Cochrane-Orcutt 
iterative procedure, the Cochrane—Orcutt two-step procedure, the Durbin two-step procedure, and the 
Hildreth-Lu scanning or search procedure. Of these, the most popular is the Cochrane—Orcutt iterative 
method. To save space, the iterative methods are discussed by way of exercises. Remember that the ultimate 
objective of these methods is to provide an estimate of p that may be used to obtain GLS estimates of the 
parameters. One advantage of the Cochrane—Orcutt iterative method is that it can be used to estimate not only 
an AR(1) scheme, but also higher-order autoregressive schemes, such as ù, = ĝi: + 02U;—2 + v;, which 
is AR(2). Having obtained the two ps, one can easily extend the generalized difference equation (12.9.6). Of 
course, the computer can now do all this. 

Returning to our wages—productivity regression, and assuming an AR(1) scheme, we use the Cochrane— 
Orcutt iterative method, which gives the following estimates of p: 0.8876, 0.9944, and 0.8827. The last value 
of 0.8827 can now be used to transform the original model as in Eq. (12.9.6) and estimate it by OLS. Of 
course, OLS on the transformed model is simply the GLS. The results are as follows: 
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Stata can estimate the coefticients of the model along with p. For example, if we assume the AR(1), Stata 
produces the following results: 


“A 


Y= 45.1042 0.57124, 
se= (4.3722) (0.0415) (12.9.16) 
t= (9.8586) (13.7638) r? = 0.8146 


From these results, we can see that the estimated rho (0) is © 0.8827, which is not very much different from 
the ô in Eq. (12.9.15). 

As noted before, in the generalized difference equation (12.9.6) we lose one observation because the first 
observation has no antecedent. To avoid losing the first observation, we can use the Prais—Winsten transfor- 
mation. Using this transformation, and using STATA (version #10), we obtain the following results for our 
wages—productivity regression: 


Rcompb, = 32.0434 + 0.6628 Prodb, 
se = (3.7182) (0.0386) r? = 0.8799 


In this transformation, the p value was 0.9193, which was obtained after 13 iterations. It should be pointed 
out that if we do not transform the first observation a la Prais—Winsten and drop that observation, the results 
sometimes are substantially different, especially in small samples. Notice that the p obtained here is not much 
different from the one obtained in Eq. (12.9.15). 


(12.9.17) 


General Comments 


There are several points about correcting for autocorrelation using the various methods discussed above. 

First, since the OLS estimators are consistent despite autocorrelation, in large samples, it makes little 
difference whether we estimate p from the Durbin—Watson d, or from the regression of the residuals in 
the current period on the residuals in the previous period, or from the Cochrane—Orcutt iterative procedure 
because they all provide consistent estimates of the true p. Second, the various methods discussed above 
are basically two-step methods. In step 1 we obtain an estimate of the unknown p and in step 2 we use that 
estimate to transform the variables to estimate the generalized difference equation, which is basically GLS. 
But since we use ô instead of the true p, all these methods of estimation are known in the literature as feasible 
GLS (FGLS) or estimated GLS (EGLS) methods. 

Third, it is important to note that whenever we use an FGLS or EGLS method to estimate the parameters 
of the transformed model, the estimated coefficients will not necessarily have the usual optimum properties 
of the classical model, such as BLUE, especially in small samples. Without going into complex technicalities, 
it may be stated as a general principle that whenever we use an estimator in place of its true value, the 
estimated OLS coefficients may have the usual optimum properties asymptotically, that is, in large samples. 
Also, the conventional hypothesis testing procedures are, strictly speaking, valid asymptotically. In small 
samples, therefore, one has to be careful in interpreting the estimated results. 

Fourth, in using EGLS, if we do not include the first observation (as was originally the case with the 
Cochrane—Orcutt procedure), not only the numerical values but also the efficiency of the estimators can 
be adversely affected, especially if the sample size is small and if the regressors are not strictly speaking 
nonstochastic.*” Therefore, in small samples it is important to keep the first observation à la Prais—Winsten. 
Of course, if the sample size is reasonably large, EGLS, with or without the first observation, gives similar 
results. Incidentally, in the literature EGLS with Prais—Winsten transformation is known as the full EGLS, 


or FEGLS, for short. 


42This is especially so if the regressors exhibit a trend, which is quite common in economic data. 
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12.10 The Newey—West Method of Correcting the OLS Standard Errors 


Instead of using the FGLS methods discussed in the previous section, we can still use OLS but correct the 
standard errors for autocorrelation by a procedure developed by Newey and West. This is an extension of 
White’s heteroscedasticity-consistent standard errors that we discussed in the previous chapter. The corrected 
standard errors are known as HAC (heteroscedasticity- and autocorrelation-consistent) standard errors 
or simply Newey—West standard errors. We will not present the mathematics behind the Newey—West 
procedure, for it is involved.“ But most modern computer packages now calculate the Newey—West standard 
errors. It is important to point out that the Newey—West procedure is strictly speaking valid in large samples 
and may not be appropriate in small samples. But in large samples we now have a method that produces 
autocorrelation-corrected standard errors so that we do not have to worry about the EGLS transformations 
discussed in the previous section. Therefore, if a sample is reasonably large, one should use the Newey—West 
procedure to correct OLS standard errors not only in situations of autocorrelation only but also in cases of 
heteroscedasticity, for the HAC method can handle both, unlike the White method, which was designed 
specifically for heteroscedasticity. 

Once again let us return to our wages—productivity regression (12.5.1). We know that this regression 
suffers from autocorrelation. Our sample of 46 observations is reasonably large, so we can use the HAC 
procedure. Using EViews 4, we obtain the following regression results: 


Y,= 32.7419 + 0.6704X, 
se = (2.9162)" (0.0302) (12.10.1) 


4 r? = 0.9765 (f= LIAL, 
where denotes HAC standard errors. 


Comparing this regression with Eq. (12.5.1), we find that in both the equations the estimated coefficients 
and the 7° value are the same. But, importantly, note that the HAC standard errors are much greater than 
the OLS standard errors and therefore the HAC t ratios are much smaller than the OLS ¢ ratios. This shows 
that OLS had in fact underestimated the true standard errors. C uriously. the d statistics in both Eqs. (12.5.1) 
and (12.10.1) are the same. But don’t worry, for the HAC procedure has already taken this into account in 
correcting the OLS standard errors. 


12.11 OLS versus FGLS and HAC 


The practical problem facing the researcher is this: In the presence of autocorrelation, OLS estimators. 
although unbiased, consistent, and asymptotically normally distributed, are not efficient. Therefore. the usual 
inference procedure based on the t, F, and y? tests is no longer appropriate. On the other hand. FGLS and 
HAC produce estimators that are efficient, but the finite, or small-sample, properties of these estimators are 
not well documented. This means in small samples the FGLS and HAC might actually do worse than OLS. 
As a matter of fact, in a Monte Carlo study Griliches and Rao* found that if the sample is relatively small 
and the coefficient of autocorrelation, p, is less than 0.3, OLS is as good or better than FGLS. As a practical 
matter, then, one may use OLS in small samples in which the estimated p is, say, less than 0.3. Of course, 


BW, K. Newey and K. West, “A Simple Positive Semi-Definite Heteroscedasticity and Autocorrelation Consistent Covariance 
Matrix, Econometrica, vol. 55, 1987, pp. 703-708. 


4f you can handle matrix algebra, the method is discussed in Greene, op. cit, 4th ed., pp. 462-463. 


457. Griliches, and P. Rao, “Small Sample Properties of Several Two-stage Regression Methods in the Context of Autocor- 
related Errors,” Journal of the American Statistical Association, vol. 64, 1969, pp. 253-272. 
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what is a large and what is a small sample are relative questions, and one has to use some practical judgment. 
If you have only 15 to 20 observations, the sample may be small, but if you have, say. 50 or more observa- 
tions, the sample may be reasonably large. 


12.12 Additional Aspects of Autocorrelation 


Dummy Variables and Autocorrelation 


In Chapter 9 we considered dummy variable regression models. In particular, recall the Indian savings- 
income regression medel for 1974-75 to 1995-96 that we presented in Eq. (9.5.1), which for convenience is 
reproduced below: 


Y, = at, + 07D; + BX; + Bo(D,X;) + uy (12.12.1) 


where Y = savings 
X = income 
D = | for observations in period 1989—90 to 1995-96 
D = 2 for observations in period 1974-75 to 1988--89 


The regression results based on this model are given in Eq. (9.5.4). Of course, this model was estimated with 
the usual OLS assumptions. 

But now suppose that u, follows a first-order autoregressive, AR(1), scheme. That is, u, = pu,_, + £, 
Ordinarily, if p is known or can be estimated by one of the methods discussed above, we can use the gener- 
alized difference method to estimate the parameters of the model that is free from (first-order) autocorrelation. 
However. the presence of the dummy variable D poses a special problem: Note that the dummy variable 
simply classifies an observation as belonging to the first or second period. How do we transform it? One can 
follow the following procedure.*° 

1. In Eq. (12.12.1), values of D are zero for all observations in the first period; in period 2 the value of D 
for the first observation is 1⁄1 — p) instead of 1, and | for all other observations. 

2. The variable X, is transtormed as (X,— pX,_,). Note that we lose one observation in this transformation, 
unless one resorts to Prais-Winsten transformation for the first observation, as noted earlier. 

3. The value of D,X, is zero for all observations in the first period (Note: D,is zero in the first period); in the 
second period the first observation takes the value of D.X,= X, and the remaining observations in the second 
period are set to (D,X,— D,pX,_,) = (X,—- pX,_,). (Note: the value of D,in the second period is 1.) 

As the preceding discussion points out, the critical observation is the first observation in the second 
period. If this is taken care of in the manner just suggested, there should be no problem in estimating regres- 
sions like Eq. (12.12.1) subject to AR(1) autocorrelation. In Exercise 12.37, the reader is asked to carry such 
a transformation for the data on U.S. savings and income given in Chapter 9. 


ARCH and GARCH Models 


Just as the error term u at time ¢ can be correlated with the error term at time (tf — 1) in an AR(1) scheme or 
with various lagged error terms in a general AR(p) scheme. can there be autocorrelation in the variance a” at 
time z with its values lagged one or more periods? Such an autocorrelation has been observed by researchers 
engaged in forecasting financial time series, such as stock prices, inflation rates, and foreign exchange rates. 
Such autocorrelation is given the rather daunting names autoregressive conditional heteroscedasticity 


46See Maddala, op. cit., pp. 321-322. 
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(ARCH) if the error variance is related to the squared error term in the previous term and generalized 
autoregressive conditional heteroscedasticity (GARCH) if the error variance is related to squared error 
terms several periods in the past. Since this topic belongs in the general area of time series econometrics, we 
will discuss it in some depth in the chapters on time series econometrics. Our objective here is to point out 
that autocorrelation is not confined to relationships between current and past error terms but also with current 
and past error variances. 


Coexistence of Autocorrelation and Heteroscedasticity 


What happens if a regression model suffers from both heteroscedasticity and autocorrelation? Can we solve 
the problem sequentially, that is, take care of heteroscedasticity first and then autocorrelation? As a matter of 
fact, one author contends that, “Autoregression can only be detected after the heteroscedasticity is controlled 
for: But can we develop an omnipotent test that can solve these and other problems (e.g., model specifi- 
cation) simultaneously? Yes, such tests exist, but their discussion will take us far afield. It is better to leave 
them for references.“ However, as noted earlier, we can use the HAC standard errors, for they take into 
account both autocorrelation and heteroscedasticity, provided the sample is reasonably large. 


12.13 A Concluding Example 


In Example 10.2, we presented data on consumption, income, wealth, and interest rates for the U.S., all in 
real terms. Based on these data, we estimated the following consumption function for the U.S. for the period 
1947-2000, regressing the log of consumption on the logs of income and wealth. We did not express the 
interest rate in the log form because some of the real interest rate figures were negative. 


Dependent Variable: 1ln(CONSUMPTION) 
Method: Least Squares 

Sample: 1947-2000 

Included observations: 54 


t-Statistic 


Coefficient Sictel., ikeietene Prob. 
€ -0.467711 0.042778 -10.93343 0.0000 ~ 

lIn (INCOME) 0.804873 0.017498 45.99836 0.0000 
1n (WEALTH) 020270 0. 037593 11.44060 0.0000 
INTEREST -0.002689 0.000762 -3.529265 0.0009 
R-squared 07999560 Mean dependent var. 7.826093 
Adjusted R-squared ARISE S.D. dependent var. 0.552306 
S.E. oË regression 0.011934 F-statistic 37832.59 
Sum squared resid. (0) OO al zat Prob. (F-statistic) 0.000000 
Log likelihood 164.5880 Durbin-Watson stat. 1. 289219 


47Lois W. Sayrs, Pooled Time Series Analysis, Sage Publications, California, 1989, p. 19. 


“*See Jeffrey M. Wooldridge, op. cit., pp. 402-403, and A. K. Bera and C. M. Jarque, “Efficient Tests for Normality, 
Homoscedasticity and Serial Independence of Regression Residuals: Monte Carlo Evidence,” Economic Letters, vol. 7, 1981 
pp. 313-318. 
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As expected, the income and wealth elasticities are positive and the interest rate semielasticity is negative. 
Although the estimated coefficients seem to be individually highly statistically significant, we need to check 
for possible autocorrelation in the error term. As we know, in the presence of autocorrelation, the estimated 
standard errors may be underestimated. Examing the Durbin—Watson d statistic, it seems the error terms in 
the consumption function suffer from (first-degree) autocorrelation (check this out). 


To confirm this, we estimated the consumption function, allowing for AR(1) autocorrelation. The results 
are as follows: 


Dependent Variable: InCONSUMPTION 

Method: Least Squares 

Sample (adjusted): 1948-2000 

Included observations: 53 after adjustments 
Convergence achieved after 11 iterations 


Coefficient Std. Error t-Statistic Prob. 


E -0.399833 0.070954 =) Sil 0.0000 
ln INCOME 0.845854 0029275 28789313 0.0000 
1nWEALTH OSSA I 0.027462 B /Sa soa 0.0000 
INTEREST 0.001214 0.000925 1.312986 0.1954 
AR (1) 0.612443 Ges LOMO Sil, 6.088462 0.0000 
R-squared 0.999688 Mean dependent var. TRGA S 
Adjusted R-squared 07999662 S.D. dependent var. 0.541833 
S.E. of regression 0.009954 F-statistic 3850359 
Sum squared resid. 0.004756 Prob. (F-statistic) 0.00000 
Log likelihood 171. 738A Durbin-Watson stat. IRS TAT2A 


These results clearly show that our regression suffers from autocorrelation. We leave it to the reader 
to remove autocorrelation using some of the transformations discussed in this chapter. You may use the 
estimated p of 0.6124 for the transformations. Below, we present the results based on Newey—West (HAC) 
standard errors that take into account autocorrelation. 


Dependent Variable: LCONSUMPTION 

Method: Least Squares 

Sample: 1947-2000 

Included observations: 54 

Newey-West HAC Standard Errors & Convariance (lag truncation = 3) 


Coefficient Stam Err Or t-Statistic Prob. 

® -0.467714 0.043937 -10.64516 0.0000 
LINCOME 0.804871 OL LILY) 47.02132 0.0000 
LWEALTH OnAoNe 7 2 0.015447 13.02988 0.0000 
INTEREST -0.002689 0.000880 -3.056306 0.0036 
R-squared 0.999560 Mean dependent var. TAGs 
Adjusted R-squared 0.999533 S.D. dependent var. Oo Scie 
S.E. of regression 0.011934 F-statistic Sy 7s) See ak 
Sum squared resid. 0.007121 Prob. (F-statistic) 0.000000 


Durbin-Watson stat. 


iae 2) easel) 


ameo eer ener eee ner e Cnr m ner at RARER eS HORACE EEA RET TEAS a 
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The major difference between the first and the last of the above regressions is that the standard errors of 
the estimated coefficients have changed substantially. Despite this, the estimated slope coefficients are still 
highly statistically significant. However, there is no guarantee that this will always be the case. 


10. 


Summary and Conclusions 


If the assumption of the classical linear regression model—that the errors or disturbances u, entering into 
the population regression function (PRF) are random or uncorrelated—is violated, the problem of serial 
or autocorrelation arises. 

Autocorrelation can arise for several reasons, such as inertia or sluggishness of economic time series, 
specification bias resulting from excluding important variables from the model or using incorrect 
functional form, the cobweb phenomenon, data massaging, and data transformation. As a result, it is 
useful to distinguish between pure autocorrelation and “induced” autocorrelation because of one or 
more factors just discussed. 

Although in the presence of autocorrelation the OLS estimators remain unbiased, consistent. and 
asymptotically normally distributed, they are no longer efficient. As a consequence, the usual t. F, and 
xX tests cannot be legitimately applied. Hence, remedial results may be called for. 

The remedy depends on the nature of the interdependence among the disturbances u,. But since the 
disturbances are unobservable, the common practice is to assume that they are generated by some 
mechanism. 

The mechanism that is commonly assumed is the Markov first-order autoregressive scheme, which 
assumes that the disturbance in the current time period is linearly related to the disturbance term in the 
previous time period, the coefficient of autocorrelation p providing the extent of the interdependence. 
This mechanism is known as the AR(1) scheme. 

If the AR(1) scheme is valid and the coefficient of autocorrelation is known, the serial correlation 
problem can be easily attacked by transforming the data following the generalized difference procedure. 
The AR(1) scheme can be easily generalized to an AR(p). One can also assume a moving average (MA) 
mechanism or a mixture of AR and MA schemes, known as ARMA. This topic will be discussed in the 
chapters on time series econometrics. 


. Even if we use an AR( 1) scheme, the coefficient of autocorrelation is not known a prioti. We considered 


several methods of estimating p, such as the Durbin—Watson d. Theil-Nagar modified d. Cochrane- 
Orcutt (C—O) iterative procedure, C-O two-step method, and the Durbin two-step procedure. In large 
samples, these methods generally yield similar estimates of p, although in small samples they perform 
differently. In practice, the C-O iterative method has become quite popular. 

Using any of the methods just discussed, we can use the generalized difference method to estimate 
the parameters of the transformed model by OLS. which essentially amounts to GLS. But since we 
estimate p (= p), we call the method of estimation feasible, or estimated, GLS. or FGLS or EGLS for 
short. 

In using EGLS, one has to be careful in dropping the first observation, for in small samples the inclusion 
or exclusion of the first observation can make a dramatic difference in the results. Therefore, in small 
samples it is advisable to transform the first observation according to the Prais—Winsten procedure. In 
large samples, however, it makes little difference if the first observation is included or not. 

It is very important to note that the method of EGLS has the usual optimum statistical properties only 
in large samples. In small samples, OLS may actually do better that EGLS. especially if p < 0.3. 


"iI. 


12. 
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Instead of using EGLS. we can still use OLS but correct the standard errors for auto- correlation 
by the Newey~West HAC procedure. Strictly speaking, this procedure is valid in large samples. One 
advantage of the HAC procedure is that it not only corrects for autocorrelation but also for heterosce- 
dasticity, if it is present. 

Of course, before remediation comes detection of autocorrelation. There are formal and informal 
methods of detection. Among the informal methods, one can simply plot the actual or standardized 
residuals, or plot current residuals against past residuals. Among formal methods, one can use the 
runs test, the Durbin—Watson d test, the asymptotic normality test, the Berenblutt-Webb test, and the 
Breusch—Godfrey (BG) test. Of these, the most popular and routinely used is the Durbin—Watson d 
test. Despite its hoary past. this test has severe limitations. It is better to use the BG test, for it is much 
more general in that it allows for both AR and MA error structures as well as the presence of lagged 
regressand as an explanatory variable. But keep in mind that it is a large sample test. 

In this chapter we also discussed very briefly the detection of autocorrelation in the presence of dummy 
regressors. 


Multiple Choice Questions 


. When error terms across cross-section data are correlated, it is known as 


a. Cross correlation 

b. Cross autocorrelation 
c. Spatial autocorrelation 
d. Serial autocorrelation 


. When error terms across time series data are intercorrelated, it is known as 


a. Cross correlation 

b. Cross autocorrelation 
c. Spatial autocorrelation 
d. Serial autocorrelation 


. The regression coefficient estimated in the presence of autocorrelation in the sample data are NOT 


a. Unbiased estimators 

b. Consistent estimators 

c. Efficient estimators 

d. Linear estimators 
Estimating the coefficients of regression mode] in the presence of autocorrelation leads to this test 
being NOT valid 

a. t-test 

b. F-test 

c. Chi-square test 

d. Allof the above 
There are several reasons for serial correlation to occur in a sample data. Which of these is NOT a 
reason? 

a. Business cycle 

b. Specification bias 

c. Manipulation of data 

d. Stationary data series 
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. When supply of a commodity, for example agricultural commodities, react to price with a lag of one 


time period due to gestation period in production, such a phenomenon is referred to as 
a. Lag phenomenon 
b. Cobweb phenomenon 
c. Inertia 
d. Business cycle 
If in our regression model, one of the explanatory variables included is the lagged value of the dependent 
variable, then the model is referred to as 
a. Best fit model 
b. Dynamic model 
c. Autoregressive model 
d. First-difference form 


. A time series sample data is considered stationary if the following characteristics of the series are time 


invariant: 
a. Mean 
b. Variance 
c. Covariance 
d. All of the above 
Regression of U; on itself lagged one period is referred to as 
a. AR(1) model 
b. AR(2) model 
c. Coefficient of autocovariance model 
d. White noise model 
In regression model u, = pu,_, + Ep —1 < p < +1, p, pis the 
a. Coefficient of autocorrelation 
b. First-order coefficient of autocorrelation 
c. Coefficient of autocorrelation at lag 1 
d. All of the above 
In the model given in Question 10 above, if the error term satisfies all the standard OLS assumptions, 
then the error term is 
a. OLS error term ~ 
b. White noise error term 
c. First-order error term 
d. AR(1) coefficient 
In Question 10 above, is lpl < 1, then the model would be 
a. AR(1) stationary 
b. AR(1) nonstationary 
c. AR(1) random walk 
d. AR(1) with heteroscedasticity 
Estimating the regression model in the presence of autocorrelation using this method leads to BLUE 
estimators: 
a. OLS 
b. GLS 
c. MLE 
d. Two-stage regression estimation 
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Using OLS estimation technique in the presence of autocorrelation will lead to 
a. The t- and F-test still being accurate 
b. The t-test gives accurate results but not the F- and chi-square tests 
c. Overestimate R? 
d. Biased estimators 
The regression model does not include the lagged value(s) of the dependent variable as one of the 
explanatory variables. This is an assumption underlying one of the following tests of autocorrelation: 
a. Durbin-Watson d test 
b. Runs test 
c. Breusch-Godfrey test 
d. Graphical method 
Which of these is NOT one of the assumptions underlying the d-statistics? 
a. The regression model includes the intercept term 
b. The U, are generated by second-order autoregressive scheme 
c. U, are normally distributed 
d. There are no missing observations in the time series data 
The d-statistics value is limited to 
a. 0to2 
b. 2to4 
c. Oto4 
d. 4+2 
If the Durbin—Watson d-test statistics is found to be equal to 0, this means that first-order autocorre- 
lation is ; 
a. Perfectly positive 
b. Perfectly negative 
c. Zero 
d. Imperfect negative correlation 
The method used to correct for autocorrelation when the p is not known is 
a. The first-difference method 
b. Cochran—Orcutt iterative method 
c. Durbin two-step procedure 
d. All of the above 
The FGLS and HAC methods give best results when applied to 
a. Small samples 
b. Large sample 
c. Autocorrelated samples 
d. Positively autocorrelated samples 
There are several reasons why serial correlation occurs. One reason which does not cause serial corre- 
lation is 
a. Most time-series data exhibit business cycles 
b. Researchers may have excluded some important variable from the regression 
c. Some variables react with a lag 
d. Large variation in the observed X variables 
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By autocorrelation we mean 


a. 
b. 
C. 


d. 


That the residuals of a regression model are not independent 

That the residuals of a regression model are related with one or more of the regressors 
That the squared residuals of a regression model are not equally spread 

That the variance of the residuals of a regression model is not constant for all observations 


. Confidence intervals of estimators estimated using OLS in presence of serial correlation in the dataset 


will be 


a. 
b. 
g 
d. 


Larger than the confidence interval derived from GLS procedure 
Smaller than the confidence interval derived from GLS procedure 
Equal to the confidence interval derived from GLS procedure 
Can’t say anything about GLS procedure 


One of the easiest ways of detecting autocorrelation is the graphical method where we 


a. 
b. 
c. 
d. 


Plot the error terms against their standardized values 
Plot the error terms against each X variable 

Plot the error terms against the Y variable 

Plot the error terms against time 


If regressing Y on X, we find the errors to be autocorrelated, transforming the model into log-linear 
model would help us get rid of this problem. This statement 


AN SAR 


is always true 

is always false 

Depends on the Durbin—Watson d-statistics 
Depends on the sign of autocorrelation 


Exercises 


Questions 


12.1. State whether the following statements are true or false. Briefly justify your answer. 


Pee, 


a. 
b. 
G 


When autocorrelation is present, OLS estimators are biased as well as inefficient. 

The Durbin—Watson d test assumes that the variance of the error term u, is homoscedastic. 

The first-difference transformation to eliminate autocorrelation assumes that the coefficient of 
autocorrelation p is —1. 


D à . . . ` . 
. The R- values of two models, one involving regression in the first-difference form and another in 


the level form, are not directly comparable. 


. A significant Durbin—Watson d does not necessarily mean there is autocorrelation of the first order. 


In the presence of autocorrelation, the conventionally computed variances and standard errors of 
forecast values are inefficient. 


. The exclusion of an important variable(s) from a regression model may give a significant d value. 
. In the AR(1) scheme, a test of the hypothesis that p = 1 can be made by the Berenblutt-Webb g 


statistic as well as the Durbin—Watson d statistic. 


. In the regression of the first difference of Y on the first differences of X, if there is a constant term 


and a linear trend term, it means in the original model there is a linear as well as a quadratic trend 
term. 


Given a sample of 50 observations and 4 explanatory variables, what can you say about autocorre- 
lation if (a) d = 1.05? (b) d = 1.40? (c) d = 2.50? (d) d = 3.97? 
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12.3. In studying the movement in the production workers’ share in the value added (1.e., labor’s share), the 
following models were considered by Gujarati:* 
Model A: FY, = bo + Bit + u: 
ModelB:  ¥, =ay+ayt aa +a, 
where Y = labor's share and z = time. Based on annual data for 1949-1964, the following results were 
obtained for the primary metal industry: 
Model A: ¥, =0.4529— 0.00414 R? = 0.5284 d= 08252 
(—3.9608) 


Model B: Y¥, =0.4786— 0.0127++ 0.00052 
(ae 724)" (2.7797) 
R? = 0.6629 d=1.82 


where the figures in the parentheses are f ratios. 
a. Is there serial correlation in model A? In model B? 
b. What accounts for the serial correlation? 
c. How would you distinguish between “pure” autocorrelation and specification bias? 
12.4. Detecting autocorrelation: von Neumann ratio test.” Assuming that the residual “; are random 
drawings from normal distribution, von Neumann has shown that for large n, the ratio 


a a — Han -1) 
st; — 2 /n 


called the von Neumann ratio, is approximately normally distributed with mean 


Note: i = 0 in OLS 


and variance 


var — = m 
5% (n+ 1)(n — 1} 
. If n is sufficiently large, how would you use the von Neumann ratio to test for autocorrelation? 
. What is the relationship between the Durbin—-Watson d and the von Neumann ratio? 
. The d statistic lies between 0 and 4. What are the corresponding limits for the von Neumann ratio? 
. Since the ratio depends on the assumption that the w’s are random drawings from normal distri- 
bution, how valid is this assumption for the OLS residuals? 
e. Suppose in an application the ratio was found to be 2.88 with 100 observations. Test the hypothesis 
that there is no serial correlation in the data. 
Note: B. I. Hart has tabulated the critical values of the von Neumann ratio tor sample sizes of up to 
60 observations.* 


XO SA 


“Damodar Gujarati, “Labor’s Share in Manufacturing Industries,” Industrial and Labor Relations Review, vol. 23, no. 1, 
October 1969, pp. 65-75. 

“J. von Neumann, “Distribution of the Ratio of the Mean Square Successive Difference to the Variance,” Annals of Math- 
ematical Statistics, vol. 12, 1941, pp. 367-395. 

tThe table may be found in Johnston, op. cit., 3d ed., p. 559. 
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12.5. Ina sequence of 17 residuals, 11 positive and 6 negative, the number of runs was 3. Is there evidence 
of autocorrelation? Would the answer change if there were 14 runs? 

12.6. Theil-Nagar p estimate based on d statistic. Theil and Nagar have suggested that, in small samples, 
instead of estimating p as (1 — d/2), it should be estimated as 


n2(1 —d/2) + k? 
n2 hats k2 


where n = total number of observations, d = Durbin—Watson d, and k = number of coefficients 

(including the intercept) tobe estimated. 

Show that for large n, this estimate of p is equal to the one obtained by the simpler formula (1 — d/2). 
12.7. Estimating p: The Hildreth-Lu scanning or search procedure. Since in the first-order autoregressive 

scheme 


pe 


Up = pur + Er 


p is expected to lie between —1 and +1, Hildreth and Lu suggest a systematic “scanning” or search 
procedure to locate it. They recommend selecting p between —1 and + | using, say, 0.1 unit intervals 
and transforming the data by the generalized difference equation (12.6.5). Thus, one may choose p 
from —0.9, —0.8,..., 0.8, 0.9. For each chosen p we run the generalized difference equation and obtain 
the associated RSS: X- ů?. Hildreth and Lu suggest choosing that p which minimizes the RSS (hence 
maximizing the R’). If further refinement is needed, they suggest using smaller unit intervals, say, 
0.01 units such as —0.99, —0.98, ..., 0.90, 0.91, and so on. 
a. What are the advantages of the Hildreth—Lu procedure? 
b. How does one know that the p value ultimately chosen to transform the data will, in fact, guarantee 
minimum ) > i?? s > 

12.8. Estimating p: The Cochrane—Orcutt (C-O) iterative procedure.” As an illustration of this procedure, 

consider the two-variable model: 


Y, = By + BoX; +u: (1) 
and the AR(1) scheme 
Uy = puri +&, -1< p <1 | (2) 
Cochrane and Orcutt then recommend the following steps to estimate p. {v 


1. Estimate Eq. (1) by the usual OLS routine and obtain the residuals, 2. Incidentally, note that you 
can have more than one X variable in the model. 
2. Using the residuals obtained in step 1, run the following regression: 


Uy = Pits; + Vv, (3) 
which is the empirical counterpart of Eq. (2).' 
3. Using ô obtained in Eq. (3), estimate the generalized difference equation (12.9.6). 


*G . Hildreth and J. Y. Lu, “Demand Relations with Autocorrelated Disturbances,” Michigan State University, Agricultural 
Experiment Station, Tech. Bull. 276, November 1960. 


**D. Cochrane and G. H. Orcutt, “Applications of Least-Squares Regressions to Relationships Containing Autocorrelated 
Error Terms,” Journal of the American Statistical Association, vol. 44, 1949, pp 32-61. 


tNote that ae oe a? (why?). Although biased, ô is a consistent estimator of the true p. 
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4. Since a priori it 1S not known if the 6 obtained from Eq. (3) is the best estimate of p, substitute 
the values of Bf and B} obtained in step (3) in the original regression Eq. (1) and obtain the new 
residuals, say, i* as 


iit = Y, — BY — Ê} X, (4) 
which can be easily computed since Y, X,, ĝ* and A are all known. 

5. Now estimate the following regression: 

úr = pur, +w (5) 
which is similar to Eq. (3) and thus provides the second-round estimate of p. 

Since we do not know whether this second-round estimate of p is the best estimate of the true p, 
we go into the third-round estimate, and so on. That is why the C—O procedure is called an iterative 
procedure. But how long should we go on this (merry-) go-round? The general recommendation is 
to stop carrying out iterations when the successive estimates of p differ by a small amount, say, by 
less than 0.01 or 0.005. In our wages—productivity example, it took about three iterations before we 
stopped. 

a. Use the Cochrane—Orcutt iterative procedure to estimate p for the wages— productivity regression, 
Eq. (12.5.2). How many iterations were involved before you obtained the “final” estimate of p? 

b. Using the final estimate of p obtained in (a), estimate the wages—productivity regression, dropping 
the first observation as well as retaining the first observation. What difference you see in the 
results? 

c. Do you think that it is important to keep the first observation in transforming the data to solve the 
autocorrelation problem? 

12.9. Estimating p: The Cochrane—Orcutt two-step procedure. This is a shortened version of the C-O 
iterative procedure. In step 1, we estimate p from the first iteration, that is from Eq. (3) in the preceding 
exercise, and in step 2 we use that estimate of p to run the generalized difference equation, as in Eq. 
(4) in the preceding exercise. Sometimes in practice, this two-step method gives results quite similar 
to those obtained from the more elaborate C—O iterative procedure. 

Apply the C—O two-step method to the illustrative wages—productivity regression (12.5.1) given in 

this chapter and compare your results with those obtained from the iterative method. Pay special 

attention to the first observation in the transformation. 
12.10. Estimating p: Durbin’s two-step method.” To explain this method, we can write the generalized 
difference equation (12.9.5) equivalently as follows: 
¥, = Bil — p) + BoX; — BrxeXy-1 + 0%:-1 + & (1) 

Durbin suggests the following two-step procedure to estimate p. First, treat Eq. (1) as a multiple 

regression model, regressing Y, on X,, X, and Y, and treat the estimated value of the regression 

coefficient of Y,—ı ( = ) as an estimate of p. Second, having obtained ô, use it to estimate the 

parameters of generalized difference equation (12.9.5) or its equivalent, Eq. (12.9.6). 

a. Apply the Durbin two-step method to the wages—productivity example discussed in this chapter 
and compare your results with those obtained from the Cochrane—Orcutt iterative procedure and 
the C—O two-step method. Comment on the “quality” of your results. 

b. If you examine Eq. (1) above, you will observe that the coefficient of X, ( = —pB») is equal to 
minus 1 times the product of the coefficient of X, ( = 8) and the coefficient of Y,_; ( = p). How 
would you test that coefficients obey the preceding restriction? 


‘|. Durbin, “Estimation of Parameters in Time-Series Regression Models,” Journal of the Royal Statistical Society, series B, 
vol. 22, 1960, p. 139-153. 
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Weld 


1212; 


In measuring returns to scale in electricity supply, Nerlove used cross-sectional data of 145 privately 

owned utilities in the United States for the period 1955 and regressed the log of total cost on the logs 

of output, wage rate, price of capital, and price of fuel. He found that the residuals estimated from 

this regression exhibited “serial” correlation, as judged by the Durbin—Watson d. To seek a remedy, 

he plotted the estimated residuals on the log of output and obtained Figure 12.11. 

a. What does Figure 12.11 show? 

b. How can you get rid of “serial” correlation in the preceding situation? 

The residuals from a regression when plotted against time gave the scattergram in Figure 12.12. The 

encircled “extreme” residual is called an outlier. An outlier is an observation whose value exceeds the 

values of other observations in the sample by a large amount, perhaps three or four standard devia- 

tions away from the mean value of all the observations. 

a. What are the reasons for the existence of the outlier(s)? 

b. If there is an outlier(s), should that observation(s) be discarded and the regression run on the 
remaining observations? 

c. Is the Durbin—Watson d applicable in the presence of the outlier(s)? 


ui 


log (output) 


Regression residuals 


Figure 12.11 Regression residuals from the Nerlove study. (Adapted from Marc Nerlove, “Return to Scale in Electric 


Supply,” in Carl F, Christ et al., Measurement in Economics, Stanford University Press, Stanford, Calif., 1963.) 


A 


U, 


Time 


Regression residuals 


Figure 12.12 Hypothetical regression residuals plotted against time. 
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12.13. Based on the Durbin—Watson d statistic, how would you distinguish “pure” auto-correlation from 
specification bias? 
12.14. Suppose in the model 


Y, = By + BoX, +u; 


the w’s are in fact serially independent. What would happen in this situation if, assuming that u, = pu,_, 
+ &,, we were to use the following generalized difference regression? 


Y, — pY;-1 = Bi. — p) + BoX: — pp2X:-1 + & 


Discuss in particular the properties of the disturbance term €, 
12.15. In a study of the determination of prices of final output at factor cost in the United Kingdom, the 
following results were obtained on the basis of annual data for the period 1951—1969: 


Il 


PF, = 2.033 + 0.273W,— 0.521X,+ 0.256M,+ 0.028M;-ı + 0.121PF,1 
se = (0.992) (0.127) (0.099) (0.024) (0.039) (0.119) 
R? =0.984 d=2.54 


where PF = prices of final output at factor cost, W = wages and salaries per employee, X = gross 
domestic product per person employed, M = import prices, M,_,; = import prices lagged 1 year, and 
PF,_, = prices of final output at factor cost in the previous year. 

“Since for 18 observations and 5 explanatory variables, the 5 percent lower and upper d values 
are 0.71 and 2.06, the estimated d value of 2.54 indicates that there is no positive autocorrelation.” 
Comment. 

12.16. Give circumstances under which each of the following methods of estimating the first-order coeffi- 
cient of autocorrelation p may be appropriate: 

First-difference regression. 

Moving average regression. 

Theil—Nagar transform. 

Cochrane and Orcutt iterative procedure. 

e. Hildreth—Lu scanning procedure. 

f Durbin two-step procedure. 


12.17. Consider the model: 


9 BS f 


Y, = Pp; + BX; + Uy 
where 
Uy = PyUy—1 + P2tly_2 + Er 


that is, the error term follows an AR(2) scheme and g, is a white noise error term. Outline the steps 
you would take to estimate the model taking into account the second-order autoregression. 


"Source: Prices and Earnings in 1951-1969: An Econometric Assessment, Department of Employment, Her Majesty's Stationery 
Office, 1971, Table C, p. 37, Eq. 63. 
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12.18. Including the correction factor C, the formula for BS!S given in Eq. (12.3.1) is 
SGLS _ (1— p7)x1y1 == panes. — pX1~1)(Yr — PYr-1) 
= (1 — p?)x? + 7G prea) 
Given this formula and Eq. (12.3.1), find the expression for the correction factor C. 

12.19. Show that estimating Eq. (12.9.5) is equivalent to estimating the GLS discussed in Section 12.3, 
excluding the first observation on Y and X. 

12.20. For regression (12.9.9), the estimated residuals have the following signs, which for ease of exposition 
are bracketed. 

GH) eae E aE TE 
(+)(--------- +) 
On the basis of the runs test, do you reject the null hypothesis that there is no auto- correlation in the 
residuals? 

"12.21. Testing for higher-order serial correlation. Suppose we have time series data on a quarterly basis. In 
regression models involving quarterly data, instead of using the AR(1) scheme given in Eq. (12.2.1), 
it may be more appropriate to assume an AR(4) scheme as follows: 

Up = P4ur_4 + Et 
that is, to assume that the current disturbance term is correlated with that of the same quarter in the 
previous year rather than that of the preceding quarter. 
To test the hypothesis that p4 = 0, Wallis’ suggests the following modified Durbin—Watson d test: 
= yee, aa; ies D 
E 
t=1 “t 
The testing procedure follows the usual d test routine discussed in the text. Wallis has prepared d, 
tables, which may be found in his original article. 
Suppose now we have monthly data. Could the Durbin—Watson test be generalized to take into account 
such data? If so, write down the appropriate d,, formula. 

12.22. Suppose you estimate the following regression: 

Aln output, = 6; + B.Aln L; + B3;Aln K, + u, 
where Y is output, L is labor input, K is capital input, and A is the first-difference operator. How would 
you interpret 8, in this model? Could it be regarded as an estimate of technological change? Justify 
your answer. 

12.23. As noted in the text, Maddala has suggested that if the Durbin—Watson d is smaller than R2, one may 
run the regression in the first-difference form. What is the logic behind this suggestion? 

*Optional. 


**Kenneth Wallis, “Testing for Fourth Order Autocorrelation in Quarterly Regression Equations,” Econometrica, vol. 40, 
1972, pp. 617-636. Tables of d4 can also be found in J. johnston, op. cit., 3d ed. , p. 558. 
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12.24. Refer to Eq. (12.4.1). Assume r = 0 but p # 0. What is the effect on E(G*) if (a) 0 < p < 1 and (b) -1 
< p < 0? When will the bias in ô? be reasonably small? 

12.25. The residuals from the wages—productivity regression given in Eq. (12.5.2) were regressed on lagged 
residuals going back six periods (i.e., AR[6]), yielding the following results: 


Dependent Variable: 8S1 

Method: Least Squares 

Sample (adjusted): 1966-2005 

Included observations: 40 after adjustments 


Coefficient std. Error t-Statistic Prob. 
Sh (= 11) T OL9716 0.170999 5.963275 0.0000 
Si) -0.029679 0.244152 -0.121560 0.9040 
Sal (8), -0.286782 0.241975 -1.185171 0.2442 
Sul) 0.149212 0.242076 0.616386 0.5417 
Sas) -0.071371 0.243386 -0.293240 OTTA 
ST 6)) 0.034362 0.167987 0.205663 0: 8383 
R-squared 0.749857 Mean dependent var. 0.004433 
Adjusted R-squared OR TISTI S.D. dependent var. 0.019843 
S.E. of regression 0.010629 Durbin-Waston stat. 1.956828 
Sum squared resid. 0.003841 


a. From the preceding results, what can you say about the nature of autocorrelation in the logarithmic 
wages—productivity data? 

b. If you think that an AR(1) mechanism characterizes autocorrelation in our data, would you use the 
first-difference transformation to get rid of autocorrelation? Justify your answer. 


Empirical Exercises 


12.26. Refer to the data on the copper industry given in Table 12.7. 
a. From these data estimate the following regression model: 


In C; = Bi + Bo In J, + B3 In L; + p41n H, + Bs In A; + ur 


Interpret the results. 

b. Obtain the residuals and standardized residuals from the preceding regression and plot them. What 
can you surmise about the presence of autocorrelation in these residuals? 

c. Estimate the Durbin—Watson d statistic and comment on the nature of autocorrelation present in 
the data. 

d. Carry out the runs test and see if your answer differs from that just given in (c). 

e. How would you find out if an AR(p) process better describes autocorrelation than an AR(1) 
process? 

Note: Save the data for further analysis. (See Exercise 12.28.) 
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Table 12.7 Determinants of U.S. Domestic Price of Copper, 1951-1980 


Year C G I L H A 
1951 21.89 330.2 45.1 220.4 1,491.0 19.00 
52 22.29 347.2 50.9 259.5 1,504.0 19.41 
53 19.63 366.1 5333 256.3 1,438.0 20.93 
54 22.85 366.3 53.6 249.3 1,551.0 21.78 
55 337/7 399.3 54.6 352.3 1,646.0. 23.68 
56 39.18 420.7 61.1 329.1 1,349.0 26.01 
57 30.58 . 442.0 61.9 219.6 1,224.0 27.52 
58 26.30 447.0 57.9 234.8 1,382.0 26.89 
59 30.70 483.0 64.8 237.4 1,553.7 26.85 
60 32.10 506.0 66.2 245.8 1,296.1 27.23 
61 30.00 523.3 66.7 229.2 1,365.0 25.46 
62 30.80 563.8 7232 233.9 1,492.5 23.88 
63 30.80 594.7 76.5 234.2 1,634.9 22.62 
64 32.60 635.7 81.7 347.0 1,561.0 23.72 
65 35.40 688.1 89.8 468.1 1,509.7 24.50 
66 36.60 753.0 97.8 555.0 1,195.8 24.50 
67 38.60 796.3 100.0 418.0 1,321.9 24.98 
68 42.20 868.5 106.3 5252 1,545.4 25.58 
69 47.90 935.5 111.1 620.7 1,499.5 27.18 
70 58.20 982.4 107.8 588.6 1,469.0 28.72 
71 52.00 1,063.4 109.6 444.4 2,084.5 29.00 
72 51.20 1,171.1 119.7 427.8 238I 26.67 
73 59.50 1,306.6 129.8 7271 205275 25.33 
74 77.30 1,412.9 129.3 877.6 1352.5 34.06 
75 64.20 1,528.8 117.8 556.6 1,171.4 39.79 
76 69.60 1,700.1 129.8 780.6 1,547.6 44.49 
77 66.80 1,887.2 137 750.7 1,989.8 S123 
78 66.50 2,127.6 145.2 709.8 2,023.3 54.42 
79 98.30 2,628.8 152.50 T9357 1,749.2 61.01 


80 101.40 2,633.1 147.1 940.9 1,298.5 70.87 


Note; The data were collected by Gary R. Smith from sources such as American Metal Market, Metals Week, and U.S. w 
Department of Commerce publications. 
C = 12-month average U.S. domestic price of copper (cents per pound). 
G = annual gross national product ($, billions). 
I = 12-month average index of industrial production. 
L = 12-month average London Metal Exchange price of copper (pounds sterling). 
H = number of housing starts per year (thousands of units). 
A = 12-month average price of aluminum (cents per pound). 


12.27. You are given the data in Table 12.8. 
a. Verify that Durbin—Watson d = 1.761. 
b. Is there positive serial correlation in the disturbances? 
c. If so, estimate p by the 
i, Theil—-Nagar method. 

ii. Durbin two-step procedure. 

iii. Cochrane—Orcutt method. 
d. Use the Theil-Nagar method to transform the data and run the regression on the transformed data. 
e. Does the regression estimated in (d) exhibit autocorrelation? If so, how would you get rid of it? 


1228: 


1229: 


12.30. 


123k: 


1232; 


Table 12.8 
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Y, Private Final Consumption 


Year Expenditure, in Rs. Crore X, Time Estimated Y Residuals 
1950-51 201,090 1 202,595.34 -1,505.34 
1951-52 213,872 2 211,471.16 2,400.84 
1952-53 222,503 3 220,346.98 2,156.02 
1953-54 235,879 4 229,222.79 6,656.21 
1954-55 243,617 5 238,098.61 5,518.39 
1955-56 245,946 6 246,974.43 -1,028.43 
1956-57 256,826 7 255,850.24 975.76 
1957-58 251,753 8 264,726.06 -12,973.06 
1958-59 274,864 9 273,601.88 126212 
1959-60 277,991 10 282,477.69 —4,486.69 
1960-61 293,804 11 291,353.51 2,450.49 
1961-62 298,813 12 300,229.32 -1,416.32 
1962-63 302,706 13 309,105.14 —6,399.14 
1963-64 313,966 14 317,980.96 —4,014.96 
1964-65 382,722 IS 326,856.77 5,865.23 
1965-66 333,017 16 335,732.59 -2,715.59 
1966-67 337,344 17 344,608.41 -7,264.41 
1967—68 356,429 18 353,484.22 2,944.78 
1968—69 365,792 19 362,360.04 3,431.96 
1969-70 379,378 20 371,235.86 8,142.14 
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Refer to Exercise 12.26 and the data given in Table 12.7. If the results of this exercise show serial 

correlation, 

a. Use the Cochrane—Orcutt two-stage procedure and obtain the estimates of the feasible GLS or the 
generalized difference regression and compare your results. 

b. If the p estimated from the Cochrane—Orcutt method in (a) differs substantially from that estimated 
from the d statistic, which method of estimating p would you choose and why? 

Refer to Example 7.4. Omitting the variables X? and X°, run the regression and examine the residuals 

for “serial” correlation. If serial correlation is found, how would you rationalize it? What remedial 

measures would you suggest? 

Refer to Exercise 7.21. A priori autocorrelation is expected in such data. Therefore, it is suggested 

that you regress the log of real money supply on the logs of real national income and interest rate in 

the first-difference form. Run this regression, and then rerun the regression in the original form. Is the 

assumption underlying the first-difference transformation satisfied? If not, what kinds of biases are 

likely to result from such a transformation? Illustrate with the data at hand. 

The use of Durbin—Watson d for testing nonlinearity, Continue with Exercise 12.29. Arrange the 

residuals obtained in that regression according to increasing values of X. Using the formula given in 

Eq. (12.6.5), estimate d from the rearranged residuals. If the computed d value indicates autocorre- 

lation, this would imply that the linear model was incorrect and that the full model should include X; > 

and X? terms. Can you give an intuitive justification for such a procedure? See if your answer agrees 

with that given by Henri Theil.” 

Refer to Exercise 11.22. Obtain the residuals and find out if there is autocorrelation in the residuals. 

How would you transform the data in case serial correlation is detected? What is the meaning of serial 
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correlation in the present instance? 
12.33. Monte Carlo experiment. Refer to Tables 12.1 and 12.2. Using €, and X, data given there, generate a 
sample of 10 Y values from the model 
Y, =3.0+0.5X; + u 
where u,= 0.9u,_, + €,. Assume Ug = 10. 
a. Estimate the equation and comment on your results. 
b. Now assume uy = 17. Repeat this exercise 10 times and comment on the results. 
c. Keep the preceding setup intact except now let p = 0.3 instead of p = 0.9 and compare your results 
with those given in (b).. 
12.34. Using the data given in Table 12.9, estimate the model 
Y, = By + BoX; + uy 
where Y = inventories and X = sales, both measured in billions of dollars. 


Table 12.9 Inventories and Sales in U.S. Manufacturing, 1950-1991 (millions of dollars) 


Year Sales* Inventories” Ratio Year Sales* Inventories“ Ratio 
1950 46,486 84,646 1.82 1971 224,619 369,374 t-57 
1951 50,229 90,560 1.80 1972 236,698 391,212 1.63 
1952 53,501 98,145 1.83 1973 242,686 405,073 1.65 
1953 52,805 101,599 1.92 1974 239,847 390,950 1.65 
1954 55,906 102,567 i83 1975 250,394 382,510 1.54 
1955 63,027 108,121 P72 1976 242,002 378,762 157 
1956 72,931 124,499 1.71 1977 251,708 379,706 1.50 
1957 84,790 157,625 1.86 1978 269,843 399,970 1.44 
1958 86,589 159,708 1.84 1979 289,973 424,843 1.44 
1959 98,797 174,636 1.77 1980 299,766 430,518 1.43 
1960 113,201 188,378 1.66 1981 319,558 443,622 N37 
1961 126,905 211,691 1.67 1982 324,984 449 083 1.38 
1962 143,936 242,157 1.68 1983 335,991 463,563 1:35 
1963 154,391 ` 265,215 Wa 1984 350,715 481,633 Hes 
1964 168,129 283,413 1.69 1985 330,875 428,108 1.38 
1965 163,351 311,852 1.95 1986 326,227 423,082 1.29 
1966 172,547 312,379 1.78 1987 334,616 408,226 1.24 
1967 . 190,682 339,516 EE 1988 359,081 439,821 ~_ 1.18 
1968 194,538 334,749 173 1989 394,615 479,106 1.17 
1969 194,657 322,654 1.68 1990 411,663 509,902 1.21 

1970 206,326 338,109 1.59 


“Annual data are averages of monthly, not seasonally adjusted, figures. 
** Seasonally adjusted, end of period figures beginning 1982 are not comparable with earlier period. 
Source: Economic Report of the President, 1993, Table B-53, p. 408. 
a. Estimate the preceding regression. 
b. From the estimated residuals find out if there is positive autocorrelation using (i) the Durbin- 
Watson test and (ii) the large-sample normality test given in Eq. (12.6.13). 
c. If p is positive, apply the Berenblutt—Webb test to test the hypothesis that p = 1. 
d. If you suspect that the autoregressive error structure is of order p, use the Breusch—Godfrey test to 
verify this. How would you choose the order of p? 
e. On the basis of the results of this test, how would you transform the data to remove autocorre- 
lation? Show all your calculations. 


“Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1978, pp. 307-308. 
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f Repeat the preceding steps using the following model: 
In Y, = B, + Bin X; + u, 


g- How would you decide between the linear and log-linear specifications? Show explicitly the test(s) 
you use. 


. Table 12.10 gives data on real rate of return on common stocks at time (RR,), output growth in period 


(1+ 1), (OG,,;), and inflation in period 7 (Inf,), all in percent form, for the U.S. economy for the period 
1954-1981. 


Table 12.10 Rate of Return, Output Growth and Inflation, United States, 1954-1981 


Observation RR Growth Inflation 


1954 53.0 6.7 —0.4 
1955 312 2.1 0.4 
1956 3.7 1.8 2:9, 
1957 138 —0.4 3.0 
1958 41.7 6.0 127 
1959 10.5 2i 15 
1960 =i 2.6 1.8 
1961 26.1 5.8 0.8 
1962 —10.5 4.0 1.8 
1963 ee 5.3 1.6 
1964 155 6.0 1.0 
1965 10.2 -6.0 233 
1966 —13.3 Luh, 3.2 
1967 2123 4.6 2.7 
1968 6.8 2.8 4.3 
1969 =13:5 Oe 5.0 
1970 —0.4 e 3.4 4.4 
1971 10.5 5.7 3.8 
1972 15.4 5.8 3.6 
1973 —22.6 —0.6 To 
1974 —37.3 2 10.8 
1975 312 5.4 6.0 
1976 UEL 5.9 4.7 
1977 =1'3.1 5.0 59 
1978 —1.3 2.8 T9 
1979 8.6 —0.3 9.8 
1980 —22.2 216 10.2 
1981 —12.2 —1.9 1.3 
. Regress RR, on inflation. 


>= 8 


. Regress RR, on OG, and Inf, 

c. Comment on the two regression results in view of Eugene Fama’s observation that “the negative 
simple correlation between real stock returns and inflation is spurious because it is the result of 
two structural relationships: a positive relation between current real stock returns and expected 
output growth [measured by OG,,,], and a negative relationship between expected output growth 
and current inflation.” / 

d. Would you expect autocorrelation in either of the regressions in (a) and (b)? Why or why not? If 

you do, take the appropriate corrective action and present the revised results. 
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12.36. The Durbin h statistic. Consider the following model of wage determination: 
¥p = By? Baer Pati 
where Y = wages = index of real compensation per hour 
X = productivity = index of output per hour. 

a. Using the data in Table 12.4, estimate the above model and interpret your results. 

b. Since the model contains lagged regressand as a regressor, the Durbin—Watson d is not appro- 
priate to find out if there is serial correlation in the data. For such models, called autoregressive 
models, Durbin has developed the so-called h statistic to test for first-order autocorrelation, which 


is defined as: 
. n 
k=- 
1 — n[var (B3)] 


where n = sample size, var (3) = variance of the coefficient of the lagged Y,_,, and p = estimate of 
the first-order serial correlation. 

For large sample size (technically, asymptotic), Durbin has shown that, under the null hypothesis 
that p = 0, 

h ~ NO, 1) 
that is, the A statistic follows the standard normal distribution. From the properties of the normal 
distribution we know that the probability of lh | > 1.96 is about 5 percent. Therefore, if in an appli- 
cation lhl > 1.96, we can reject the null hypothesis that p = 0, that is, there is evidence of first-order 
autocorrelation in the autoregressive model given above. 

To apply the test, we proceed as follows: First, estimate the above model by OLS (don’t worry 
about any estimation problems at this stage). Second, note var ( B3) in this model as well as the 
routinely computed d statistic. Third, using the d value, obtain 6 ~ (1 — d/2). It is interesting to 
note that although we cannot use the d value to test for serial correlation in this model, we can 
use it to obtain an estimate of p. Fourth, now compute the h statistic. Fifth, if the sample size is 
reasonably large and if the computed lhl exceeds 1.96, we can conclude that there is evidence of 
first-order autocorrelation. Of course, you can use any level of significance you want. 

Apply the A test to the autoregressive wage determination model given earlier and draw appro- 
priate conclusions and compare your results with those given in regression (12.5.1). 

12.37. Dummy variables and autocorrelation. Refer to the savings—income regression discussed in Chapter 
9. Using the data given in Table 9.2, and assuming an AR(1) scheme, reestimate the savings—income 
regression, taking into account autocorrelation. Pay close attention to the transformation of the 
dummy variable. Compare your results with those presented in Chapter 9. 

12.38. Using the wages—productivity data given in Table 12.4, estimate model (12.9.8) and compare your 
results with those given in regression (12.9.9). What conclusion(s) do you draw? 


Key to Multiple Choice Questions 


E) 2. (d) sh (©) 4. (d) 5. (d) 6. (b) TEC) 8. (d) 9. (a) 
10. (d) 11. (b) 12. (a) 13. (b) 14. (c) 15. (a) 16. (b) WUE) 18. (a) 
19. (d) 2020) 22 a a) 24 (d) a 254 (b) 


*] . Durbin, “Testing for Serial Correlation in Least-squares Regression When Some of the Regressors Are Lagged Dependent 
Variables,” Econometrica, vol. 38, pp. 410-421. 
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Appendix 12A 


12A.1 Proof that the Error Term v, in Equation (12.1.11) is 


Autocorrelated 


Since v,= u,—u,_. it is easy to show that E(v,) = Etu,- u,_,) = E(u,) — E(u,_,) = 0, since E(u) = 0, for each t. Now, var (v,) 
= var(u,— u,_,) = var (u,) + var (u,_,) = 207. since the variance of each u, is a° and the u’s are independently distributed. 


Hence, v, is homoscedastic. But 


COV (vr, Vr-1) = E(vivi-1) = E[(ur — ur—1)(ur-1 — ur-2)] 


= —d- 


which is obviously nonzero. Therefore. although the u’s are not autocorrelated, the v’s are. 


12A.2 Proof of Equations (12.2.3), (12.2.4), and (12.2.5) 


Under AR(1), 
Uy = puti + Ey 
Therefore, 
E(u:) = pE(ur_-1) + E(er) = 0 
So, 


E(u:) = pE(u;-)) + Eer) = 0 
because the u’s and ©’s are uncorrelated. 
Since var (u,) =var(u,_;) = o° and var (£;) = 07, we get 
o? 


1 — p? 


var (u;) = 


Now multiply Eq. (1) by u,_, and take expectations on both sides to obtain: 


cov (ur, Ur-1) = E(urur-1) = E [ou?_, + urie] = pE CR) 


Noting that the covariance between u,_, and €, 1s zero (why?) and that var (u,) = var (u,_,) = 2 l= po): we obtain 


2 


O; 
COV (Uy, Ur—-1) = pirat) 
Continuing in this fashion, 
gz 
COV (ur, Ut-2) = Pa 
3 of 
COV (Ur, U3) = p C 


and so on. Now the correlation coefficient is the ratio of covariance to variance. Hence, 
2 
COT (ur, Ur—1) =P COV (ur, Ur-2) = P 


and so on. 


(1) 


(2) 


(3) 


(4) 


(5) 


CHAPTER 


Econometric Modeling: 
Model Specification and 
Diagnostic Testing 


Applied econometrics cannot be done mechanically; it needs understanding, intuition and skill.! 


... we generally drive across bridges without worrying about the soundness of their construction because we are 
reasonably sure that someone rigorously checked their engineering principles and practice. Economists must do 
likewise with models or else attach the warning “not responsible if attempted use leads to collapse." 


Economists’ search for “truth” has over the years given rise to the view that economists are people searching in a 
dark room for a non-existent black cat; econometricians are regularly accused of finding one.* 


One of the assumptions of the classical linear regression model (CLRM). Assumption 9, is that the 
regression model used in the analysis is “correctly” specified: If the model is not “correctly” specified. we 
encounter the problem of model specification error or model specification bias. In this chapter we take a 
close and critical look at this assumption, because searching for the correct model is like searching for the 
Holy Grail. In particular we examine the following questions: 


1. How does one go about finding the “correct” model? In other words, what are the criteria in choosing 
a model for empirical analysis? 

2. What types of model specification errors is one likely to encounter in practice? 

3. What are the consequences of specification errors? 

4. How does one detect specification errors? In other words, what are some of the diagnostic tools that one 
can use? 

5. Having detected specification errors, what remedies can one adopt and with what benefits? 

6. How does one evaluate the performance of competing models? 


‘Keith Cuthbertson, Stephen G. Hall, and Mark P. Taylor, Applied Econometrics Techniques, Michigan University Press, 1992 
p. X. 


David F. Hendry, Dynamic Econometrics, Oxford University Press, U.K., 1995, p. 68. 
3Peter Kennedy, A Guide to Econometrics, 3d ed., The MIT Press, Cambridge, Mass., 1992, p. 82. 
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The topic of model specification and evaluation is vast, and very extensive empirical work has been done 
in this area. Not only that, but there are philosophical differences on this topic. Although we cannot do full 
justice to this topic in one chapter, we hope to bring out some of the essential issues involved in model speci- 
fication and model evaluation. 


13.1 Model Selection Criteria 


According to Hendry and Richard, a model chosen for empirical analysis should satisfy the following criteria:4 

1. Be data admissible; that is, predictions made from the model must be logically possible. 

2. Be consistent with theory; that is, it must make good economic sense. For example, if Milton Fried- 
man’s permanent income hypothesis holds, the intercept value in the regression of permanent consumption 
on permanent income is expected to be zero. 

3. Have weakly exogenous regressors; that is, the explanatory variables, or regressors, must be uncorre- 
lated with the error term. It may be added that in some situations the exogenous regressors may be strictly 
exogenous. A strictly exogenous variable is independent of current, future, and past values of the error term. 

4. Exhibit parameter constancy; that is, the values of the parameters should be stable. Otherwise, 
forecasting will be difficult. As Friedman notes, “The only relevant test of the validity of a hypothesis [model] 
is comparison of its predictions with experience.’ In the absence of parameter constancy, such predictions 
will not be reliable. 5 es 

5. Exhibit data coherency; that is, the residuals estimated from the model must be purely random (techni- 
cally, white noise). In other words, if the regression model is adequate, the residuals from this model must be 
white noise. If that is not the case, there is some specification error in the model. Shortly, we will explore the 
nature of specification error(s). 

6. Be encompassing: that is, the model should encompass or include all the rival models in the sense that 
it is capable of explaining their results. In short, other models cannot be an improvement over the chosen 
model. 

It is one thing to list criteria of a “good” model and quite another to actually develop it, for in practice one 
is likely to commit various model specification errors, which we discuss in the next section. 


13.2 Types of Specification Errors 


Assume that on the basis of the criteria just listed we arrive at a model that we accept as a good model. To be 
concrete, let this model be 


Y, = Bi + BX; + PX? + p4 X} + ui (13.2.1) 


where Y = total cost of production and X = output. Equation (13.2.1) is the familiar textbook example of the 


cubic total cost function. 
But suppose for some reason (say, laziness in plotting the scattergram) a researcher decides to use the 


following model: 

Y; = ay + 2X; + az X? + Uzi (13.2.2) 
Note that we have changed the notation to distinguish this model from the true model. 
4D. F. Hendry and J. F. Richard, “The Econometric Analysis of Economic Time Series,” International Statistical Review, vol. 
51, 1983, pp. 3-33. 


SMilton Friedman, “The Methodology of Positive Economics,” in Essays in Positive Economics, University of Chicago Press, 
Chicago, 1953, p. 7. 
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Since Eq. (13.2.1) is assumed true, adopting Eq. (13.2.2) would constitute a specification error, the error 
consisting in omitting a relevant variable (X; 3), Therefore, the error term u»; in Eq. (13.2.2) is in fact 


uz = Uy + pa X? - (13.2.3) 


We shall see shortly the importance of this relationship. 
Now suppose that another researcher uses the following model: 


Y, = Ay FAX; + 23X2 + AG? + SXF + uzi T (13.2.4) 
If Eq. (13.2.1) is the “truth,” Eq. (13.2.4) also constitutes a specification error, the error here consisting in 
including an unnecessary or irrelevant variable in the sense that the true model assumes À; to be zero. The 
new error term is in fact 


i = uji — Às X} 
ooa (13.2.5) 


= since A5 = 0 in the true model (Why?) 
Now assume that yet another researcher postulates the following model: 
In Y; = yi + Xi + yX? + y4X} + ai (13.2.6) 


In relation to the true model, Eq. (13.2.6) would also constitute a specification bias, the bias here being the 
use of the wrong functional form: In Eq. (13.2.1) Y appears linearly, whereas in Eq. (13.2.6) it appears 
log-linearly. 

Finally, consider the researcher who uses the following model: 


= Bt + BX} + BEX}? + BEXP + ut (13.2.7) 


where Y* = Y; + s; and X* = X; + wj, g; and w; being the errors of measurement. What Eq. (13.2.7) states 
is that instead of using the true Y, and X; we use their proxies, Y* and X7, which may contain errors of 
measurement. Therefore, in Eq. (13.2.7) we commit the errors of measurement bias. In applied work data 
are plagued by errors of approximations or errors of incomplete coverage or simply errors of omitting some 
observations. In the social sciences we often depend on secondary data and usually have no way of knowing 
the types of errors, if any, made by the primary data-collecting agency. 
Another type of specification error relates to the way the stochastic error u; (or u,) enters the regression 
model. Consider for instance, the following bivariate regression model without the intercept term: 


Y; = page, hi (13.2.8) 


where the stochastic error term enters multiplicatively with the property that In u; satisfies the assumptions of 
the CLRM, against the following model 


Y; = aX; + uj (13.2.9) 


where the error term enters additively. Although the variables are the same in the two models, we have 
denoted the slope coefficient in Eq. (13.2.8) by B and the slope coefficient in Eq. (13.2.9) by œ. Now if 
Eq. (13.2.8) is the “correct” or “true” model, would the estimated œ provide an unbiased estimate of the true 
B? That is, will E(@) = £? If that is not the case, improper stochastic specification of the error term will 
constitute another source of specification error. 

A specification error that is sometimes overlooked is the interaction among the regressors, that is, the 
multiplicative effect of one or more regressors on the regressand. To illustrate, consider the following 
simplified wage function: 


In W; = B; + z Education; + 63 Gender; 
+ p4 (Education) (Gender) + u (13.2.10) 
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In this model, the change in the relative wages with respect to education depends not only on education 
but also on the gender (i. = P2 + B4Gender). Likewise, the change in relative wages with respect to 
gender depends not only on gender but also on education. 

To sum up, in developing an empirical model, one is likely to commit one or more of the following speci- 


fication errors: 


Omission of a relevant variable(s). 

Inclusion of an unnecessary variable(s). 

Adoption of the wrong functional form. 

Errors of measurement. 

Incorrect specification of the stochastic error term. 
Assumption that the error term is normally distributed. 


DEn h Sa ae 


Before turning to an examination of these specification errors in some detail, it may be fruitful to distin- 
guish between model specification errors and model mis-specification errors. The first four types of error 
discussed above are essentially in the nature of model specification errors in that we have in mind a “true” 
model but somehow we do not estimate the correct model. In model mis-specification errors, we do not know 
what the true model is to begin with. In this context one may recall the controversy between the Keynesians 
and the monetarists. The monetarists give primacy to money in explaining changes in GDP, whereas the 
Keynesians emphasize the role of government expenditure to explain changes in GDP. So to speak, these are 
two competing models. 

In what follows, we will first consider model specification errors and then examine model mis-specifi- 
cation errors. 


13.3 Consequences of Model Specification Errors 


Whatever the sources of specification errors, what are the consequences? To keep the discussion simple, we 
will answer this question in the context of the three-variable model and consider in this section the first two 
types of specification errors discussed earlier, namely, (1) underfitting a model, that is, omitting relevant 
variables, and (2) overfitting a model, that is, including unnecessary variables. Our discussion here can be 
easily generalized to more than two regressors, but with tedious algebra; matrix algebra becomes almost a 
necessity once we go beyond the three-variable case. 


Underfitting a Model (Omitting a Relevant Variable) 


Suppose the true model is: 


Y; = Bi + BoX2 + B3X3i + uj (13.3.1) 
but for some reason we fit the following model: 
Y; = a, +a2X 2; + Vi (13.3.2) 


The consequences of omitting variable X, are as follows: 

1. If the left-out, or omitted, variable X, is correlated with the included variable X,, that is, r33, the corre- 
lation coefficient between the two variables is nonzero and @ and a are biased as well as inconsistent. That 
is, E(@) Æ Bi and E(&2) # fo, and the bias does not disappear as the sample size gets larger. 


6But see Exercise 13.32. 
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2. Even if X, and X; are not a ĝis biased, although a is now unbiased. 

3. The disturbance variance g° is incorrectly estimated. 

4. The conventionally measured variance of Oty (arora a we) 4 is a biased estimator of Bo. 

5. In consequence, the usual confidence interval and hypothesis-testing procedures are likely to give 
misleading conclusions about the statistical significance of the estimated parameters. 

6. As another consequence, the forecasts based on the incorrect model and the forecast (confidence) 
intervals will be unreliable. 


Although proofs of each of oe above statements will take us far afield,’ it is — in Appendix 13A, 
Section 13A.1, that 


E(&2) = Bo + B3b32 (13.3.3) 


where b}, is the slope in the regression of the excluded variable X; on the included variable 
X (b32 = > a O As Eq. (13.3.3) shows, a2 is biased, unless 8, or b3, or both are zero. We rule 
out B, being zero, because in that case we do not have specification error to begin with. The coefficient b, 
will be zero if X, and X, are uncorrelated, which is unlikely in most economic data. 

Generally, however, the extent of the bias will depend on the bias term B3b,). If, for instance. B, is positive 
(i.e., X; has a positive effect on Y) and b}, is positive (i.e., X, and X; are positively correlated), a2. on average, 
will overestimate the true £, (i.e., positive bias). But this result should not be surprising. for X, represents 
not only its direct effect on Y but also its indirect effect (via X;) on Y. In short, X, gets credit for the influence 
that is rightly attributable to X}, the latter being prevented from showing its effect explicitly because it is not 
“allowed” to enter the model. As a concrete example, consider the example discussed in Chapter 7 (Example 
ZL): 


Example 13.1 Illustrative Example: Child Mortality Revisited 


Regressing child mortality (CM) on per capita GNP (PGNP) and the female literacy rate (FLR), we obtained 
the regression results shown in Eq. (7.6.2), giving the partial slope coefficient values of the two variables 
as —0.0056 and -2.2316, respectively. But if we now drop the FLR variable, we obtain the results shown in 
Eq. (7.7.2). If we regard Eq. (7.6.2) as the correct model, then Eq. (7.7.2) is a mis-specified model in that it 
omits the relevant variable FLR. Now you can see that in the correct model the coefficient of the PGNP variable 
was —0.0056, whereas in the “incorrect” model (7.7.2) it is now —0.0114. 

In absolute terms, now PGNP has a greater impact on CM as compared with the true model. But if we 
regress FLR on PGNP (regression of the excluded variable on the included variable), the slope coefficient in 
this regression (b3, in terms of Eq. [1 3.3.3]) is 0.00256.° This suggests that as PGNP increases by a unit, on 
average, FLR goes up by 0.00256 units. But if FLR goes up by these units, its effect on CM will be (-2.2316) 
(0.00256) = f3b32 = —0.00543. 

Therefore, from Eq. (13.3.3) we finally have (2 + B3b32) = [-0.0056 + (-2.2316) (0. se se = -0.0111, 
which is about the value of the PGNP coefficient obtained in the incorrect model (7.7.2).? As this example 
illustrates, the true impact of PGNP on CM is much less (0.0056) than that suggested by the incorrect model 
(7.7.2), namely, (~0.0114). 


TE 


For an algebraic treatment, see Jan Kmenta, Elements of Econometrics, Macmillan, New York, 1971, pp. 391-399. Those 
with a matrix algebra background may want to consult J. Johnston, Econometrics Methods, 4th ed., VICCEN Hill, New York, 
1997, pp. 119-112. 


8The regression results are: 
FLR = 47.5971 + 0.00256PGNP 
se = (3.5553) (0.0011) r? = 0.0721 
Note that in the true model £2 and £3 are unbiased estimates of their true values. 
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Now let us examine the variances of & and ĝ, 


tw 


var (@2) = =a (13.3.4) 


o? 2 


x5,(1 —133) xh 
where VIF (a measure of collinearity) is the variance inflation factor [ = 1/(1 — r2,)] discussed in Chapter 
10 and r3; is the correlation coefficient between variables X, and X3; Eqs. (13.3.4) and (13.3.5) are familiar 
to us from Chapters 3 and 7. 

As formulas (13.3. 4) and (13.3.5) are not the same, in general, var (@2) will be different from var (p2). 
But we know that var ( po) i is unbiased (why?). Therefore, var (&2) is biased, thus substantiating the statement 
made in point 4 earlier. Since 0 < ie < 1, it would seem that in the present case var (@2) < var ( bo). Now 
we face a dilemma: Although âz is biased, its variance is smaller than the variance of the unbiased estimator 
B, (of course, we are ruling out the case where r», = 0, since in practice there is some correlation between 
regressors). So, there is a trade-off involved here.!° 

The story is not complete yet, however, for the a” estimated from model (13.3.2) and that estimated from 
the true model (13.3.1) are not the same because the residual sum of squares (RSS) of the two models as 
well as their degrees of freedom (df) are different. You may recall that we obtain an estimate of o° as ô? = 
RSS/df, which depends on the number of regressors included in the model as well as the df ( = n, number 
of parameters estimated). Now if we add variables to the model, the RSS generally decreases (recall that as 
more variables are added to the model, the R? increases), but the degrees of freedom also decrease because 
more parameters are estimated. The net outcome depends on whether the RSS decreases sufficiently to offset 
the loss of degrees of freedom due to the addition of regressors. It is quite possible that if a regressor has a 
strong impact on the regressand—for example, it may reduce RSS more than the loss in degrees of freedom 
as a result of its addition to the model—inclusion of such variables will not only reduce the bias but will also 
increase the precision (i.e., reduce the standard errors) of the estimators. 

On the other hand, if the relevant variables have only a marginal impact on the regressand, and if they 
are highly correlated (i.e., VIF is larger), we may reduce the bias in the coefficients of the variables already 
included in the model, but increase their standard errors (i.e., make them less efficient). Indeed, the trade-off 
in this situation between bias and precision can be substantial. As you can see from this discussion, the 
trade-off will depend on the relative importance of the various regressors. 

To conclude this discussion, let us consider the special case where r,, = 0, that is, X, and X, are uncor- 
related. This will result in b}, being zero (why?). Therefore, it can be seen from Eq. (13.3.3) that @ is now 
unbiased.!! Also, it seems from Egs. (13.3.4) and (13.3.5) that the variances of â and bo are the same. Is 
there no harm in dropping the variable X, from the model even though it may be relevant theoretically? The 
answer generally is no, for in this case, as noted earlier, var (@2) estimated from Eq. (13.3.4) is still biased 
and therefore our hypothesis-testing procedures are likely to remain suspect. 12 Besides, in most economic 
research X, and X; will be correlated, thus creating the problems discussed previously. The point is clear: 
Once a model is formulated on the basis of the relevant theory, one is ill-advised to drop a variable 
from such a model. 


var (ĝ2) = VIF (13.3.5) 


'°To bypass the trade-off between bias and efficiency, one could choose to minimize the mean square error (MSE), since it 
accounts for both bias and efficiency. On MSE, see the statistical appendix, Appendix A. See also Exercise 13.6. 


"Note, though, â is still biased, which can be seen intuitively as follows: We know that A, = Y — 2X2 — £3X3, whereas 
&@ = Y — â2 X2, and even if a2 = 2, the two intercept estimators will not be the same. 


12For details, see Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar Publisher, 1994, pp. 371-372. 
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Inclusion of an Irrelevant Variable (Overfitting a Model) 


Now let us assume that 
Y= Bi 24 ey (13.3.6) 
is the truth, but we fit the following model: 
Y; = a + 2X2; + 03.X3; + Vj . à (13.357) 


and thus commit the specification error of including an unnecessary variable in the model. 

The consequences of this specification error are as follows: 

1. The OLS estimators of the —— of the “incorrect” model are all unbiased and consistent, that is, 
Ela) = Bi, E(@2) = Ba. and E(@3) = p3 = 

2. The error variance o” is correctly ko 

3. The usual confidence interval and hypothesis-testing procedures remain valid. 

4. However, the estimated a’s will be generally inefficient, that is, their variances will be generally larger 
than those of the B’s of the true model. The proofs of some of these statements can be found in Appendix 
13A, Section 13A.2. The point of interest here is the relative inefficiency of the @’s. This can be shown easily. 

From the usual OLS formula we know that 

ee 


sane (13.3.8) 
5 x3; 
and 
a2 
VAr = 
pone ee i (13.3.9) 
Therefore, 
var (&2) 1 
p (13.3.10) 


var(ĝ) l-r 23 

Since 0 = Pra < |, it follows that var (@2) = var ( bo); that is, the variance of 2 is generally greater than 
the variance of Bo even though, on average, @ = fp fi.e., E(@2) = Bo]. 

The implication of this finding is that the inclusion of the unnecessary variable X, makes the variance of 
â larger than necessary, thereby making âz less precise. This is also true of @. Š 

Notice the asymmetry in the two types of specification biases we have considered. If we exclude a relevant 
variable, the coefficients of the variables retained in the model are generally biased as well as inconsistent, the 
error variance is incorrectly estimated, and the usual hypothesis-testing procedures become invalid. On the 
other hand, including an irrelevant variable in the model still gives us unbiased and consistent estimates of 
the coefficients in the true model, the error variance is correctly estimated, and the conventional hypothesis- 
testing methods are still valid; the only penalty we pay for the inclusion of the superfluous variable is that the 
estimated variances of the coefficients are larger, and as a result our probability inferences about the para- 
meters are less precise. An unwanted conclusion here would be that it is better to include irrelevant variables 
than to omit the relevant ones. But this philosophy is not to be espoused because the addition of unnecessary 
variables will lead to a loss in the efficiency of the estimators and may also lead to the problem of multicol- 
linearity (why?), not to mention the loss of degrees of freedom. Therefore, i 


In general, the best approach is to include only explanatory variables that, on theoretical grounds, directly influence 
the dependent variable and that are not accounted for by other included variables. '* 


'3Michael D. Intriligator, Econometric Models, Techniques and Applications, Prentice Hall, Englewood Cliffs, Nj, 1978, p. 189. 
Recall the Occam’s razor principle. 
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13.4 Tests of Specification Errors 


Knowing the consequences of specification errors is one thing but finding out whether one has committed 
such errors is quite another, for we do not deliberately set out to commit such errors. Very often specification 
biases arise inadvertently, perhaps from our inability to formulate the model as precisely as possible because 
the underlying theory is weak or because we do not have the right kind of data to test the model. As Davidson 
notes. “Because of the non-experimental nature of economics, we are never sure how the observed data were 
generated. The test of any hypothesis in economics always turns out to depend on additional assumptions 
necessary to specify a reasonably parsimonious model, which may or may not be justified.”!* 

The practical question then is not why specification errors are made, for they generally are, but how to 
detect them. Once it is found that specification errors have been made, the remedies often suggest themselves. 
If, for example. it can be shown that a variable is inappropriately omitted from a model, the obvious remedy 
is to include that variable in the analysis, assuming, of course, the data on that variable are available. 

In this section we discuss some tests that one may use to detect specification errors. 


Detecting the Presence of Unnecessary Variables (Overfitting a Model) 


Suppose we develop a k-variable model to explain a phenomenon: 


Y; = By Gg Bo X2; ar Ooms BX ki + Uj (13.4.1) 


However. we are not totally sure that. say, the variable X, really belongs in the model. One simple way to 
find this out is to test the significance of the estimated £, with the usual ż test: t = Ê, /se (Bx). But suppose 
that we are not sure whether. say. X, and X, legitimately belong in the model. This can be easily ascertained 
by the F test discussed in Chapter 8. Thus, detecting the presence of an irrelevant variable (or variables) is 
not a difficult task. 

It is, however. very important to remember that in carrying out these tests of significance we have a 
specific model in mind. We accept that model as the maintained hypothesis or the “truth.” however tentative 
it may be. Given that model, then, we can find out whether one or more regressors are really relevant by the 
usual ż and F tests. But note carefully that we should not use the ¢ and F tests to build a model iteratively, 
that is, we should not say that initially Y is related to X, only because Bo is statistically significant and then 
expand the model to include X, and decide to keep that variable in the model if 83 turns out to be statistically 
significant, and so on. This strategy of building a model is called the bottom-up approach (starting with a 
smaller model and expanding it as one goes along) or by the somewhat pejorative term, data mining (other 
names are regression fishing, data grubbing, data snooping, and number crunching). 

The primary objective of data mining is to develop the “best” model after several diagnostic tests so that 
the model finally chosen is a “good” model in the sense that al] the estimated coefficients have the “right” 
signs, they are statistically significant on the basis of the ¢ and F tests, the R° value is reasonably high, and 
the Durbin—Watson d has acceptable value (around 2), etc. The purists in the profession look down on the 
practice of data mining. In the words of William Pool, “. . . making an empirical regularity the foundation, 
rather than an implication of economic theory, is always dangerous.”'* One reason for “condemning” data 
mining is as follows. 


4)ames Davidson, Econometric Theory, Blackwell Publishers, Oxford, U.K., 2000, p. 153. 
'SWilliam Pool, “Is Inflation Too Low?” the Cato Journal, vol. 18, no. 3, Winter 1999, p. 456. 
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Nominal versus True Level of Significance in the Presence of Data Mining 


A danger of data mining that the unwary researcher faces is that the conventional levels of significance (q@) 
such as 1, 5, or 10 percent are not the true levels of significance. Lovell has suggested that if there are c 
candidate regressors out of which k are finally selected (k = c) on the basis of data mining, then the true level 


of significance (a’) is related to the nominal level of significance (a) as follows:'® 
a* = 1—(1—a)*/* . (13.4.2) 
or approximately as 
a* + (c/k)a (13.4.3) 


For example, if c = 15, k = 5, and a = 5 percent, from Eq. (13.4.3) the true level of significance is (15/5) 
(5) = 15 percent. Therefore, if a researcher data-mines and selects 5 out of 15 regressors and reports only 
the results of the condensed model at the nominal 5 percent level of significance and declares that the results 
are statistically significant, one should take this conclusion with a big grain of salt, for we know the (true) 
level of significance is in fact 15 percent. It should be noted that if c = k, that is, there is no data mining, the 
true and nominal levels of significance are the same. Of course, in practice most researchers report only the 
results of their “final” regression without necessarily telling about all the data mining, or pretesting, that has 
gone before.!” 

Despite some of its obvious drawbacks, there is increasing recognition, especially among applied econo- 
metricians, that the purist (i.e., non—data mining) approach to model building is not tenable. As Zaman notes: 


Unfortunately, experience with real data sets shows that such a [purist approach] is neither feasible nor desirable. 
It is not feasible because it is a rare economic theory which leads to a unique model. It is not desirable because a 
crucial aspect of learning from the data is learning what types of models are and are not supported by data. Even 
if, by rare luck, the initial model shows a good fit, it is frequently important to explore and learn the types of the 
models the data does or does not agree with.!® 


A similar view is expressed by Kerry Patterson, who maintains that: 


This [data mining] approach suggests that economic theory and empirical specification [should] interact rather 
than be kept in separate compartments. °? 


Instead of getting caught in the data mining versus the purist approach to model-building controversy, one 
can endorse the view expressed by Peter Kennedy: 


[that model specification] needs to be a well-thought-out combination of theory and data, and tkat testing proce- 
dures used in specification searches should be designed to minimize the costs of data mining. Examples of such 
procedures are setting aside data for out-of-sample prediction tests, adjusting significance levels [a la Lovell], and 
avoiding questionable criteria such as maximizing R.” 


If we look at data mining in a broader perspective as a process of discovering empirical regularities that 
might suggest errors and/or omissions in (existing) theoretical models, it has a very useful role to play. To 
quote Kennedy again, “The art of the applied econometrician is to allow for data-driven theory while avoiding 
the considerable dangers in data mining.” l 


16M. Lovell, “Data Mining,” Review of Economics and Statistics, vol. 65, 1983, pp. 1-12. 


17For a detailed discussion of pretesting and the biases it can lead to, see T. D. Wallace, “Pretest Estimation in Regression: A 
Survey,” American Journal of Agricultural Economics, vol. 59, 1977, pp. 431-443. 


18Asad Zaman, Statistical Foundations for Econometric Techniques, Academic Press, New York, 1996, p. 226. 
Kerry Patterson, An Introduction to Applied Econometrics, St. Martin’s Press, New York, 2000, p. 10. 


Peter Kennedy, “Sinning in the Basement: What Are the Rules? The Ten Commandments of Applied Econometrics,” 
unpublished manuscript. 


2'Kennedy, op. cit., p. 13. 
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Tests for Omitted Variables and Incorrect Functional Form 


In practice we are never sure that the model adopted for empirical testing is “the truth, the whole truth and 
nothing but the truth.” On the basis of theory or introspection and prior empirical work, we develop a model 
that we believe captures the essence of the subject under study. We then subject the model to empirical 
testing. After we obtain the results, we begin the post-mortem, keeping in mind the criteria of a good model 
discussed earlier. It is at this stage that we come to know if the chosen model is adequate. In determining 
model adequacy, we look at some broad features of the results, such as the R? value, the estimated f ratios, the 
signs of the estimated coefficients in relation to their prior expectations, the Durbin—Watson statistic, and the 
like. If these diagnostics are reasonably good, we proclaim that the chosen model is a fair representation of 
reality. By the same token, if the results do not look encouraging because the R2 value is too low or because 
very few coefficients are statistically significant or have the correct signs or because the Durbin—Watson d 
is too low, then we begin to worry about model adequacy and look for remedies: Maybe we have omitted an 
important variable, or have used the wrong functional form, or have not first-differenced the time series (to 
remove serial correlation), and so on. To aid us in determining whether model inadequacy is on account of 
one or more of these problems, we can use some of the following methods. 


Examination of Residuals 


As noted in Chapter 12, examination of the residuals is a good visual diagnostic to detect autocorrelation or 
heteroscedasticity. But these residuals can also be examined, especially in cross-sectional data, for model 
specification errors, such as omission of an important variable or incorrect functional form. If in fact there are 
such errors, a plot of the residuals will exhibit distinct patterns. 

To illustrate, let us reconsider the cubic total cost of production function first considered in Chapter 7. 
Assume that the true total cost function is described as follows, where Y = total cost and X = output: 


Y; = Bi + BX; + BsX} + BaX} + ui | (13.4.4) 
but a researcher fits the following quadratic function: 
Y; = a +02X; + 03X? + up; (13.4.5) 
and another researcher fits the following linear function: 
Y; =A, + À2X; + 43; (13.4.6) 


Although we know that both researchers have made specification errors, for pedagogical purposes let us see 
how the estimated residuals look in the three models. (The cost-output data are given in Table 7.4.) Figure 
13.1 speaks for itself: As we move from left to right, that is, as we approach the truth, not only are the 
residuals smaller (in absolute value) but also they do not exhibit the pronounced cyclical swings associated 
with the misfitted models. 

The utility of examining the residual plot is thus clear: If there are specification errors, the residuals will 
exhibit noticeable patterns. 


The Durbin-Watson d Statistic Once Again 


If we examine the routinely calculated Durbin—Watson d in Table 13.1, we see that for the linear cost function 
the estimated d is 0.716, suggesting that there is positive “correlation” in the estimated residuals: for n = 10 
and k’ = 1, the 5 percent critical d values are d, = 0.879 and dy = 1.320. Likewise, the computed d value 
for the quadratic cost function is 1.038, whereas the 5 percent critical values are d, = 0.697 and dy = 1.641, 
indicating indecision. But if we use the modified d test (see Chapter 12), we can say that there is positive 
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So a x 
Output 


Residuals 
(ae) 


(a) (b) (c) 
Figure 13.1 Residuals 4; from (a) linear, (4) quadratic, and (¢) cubic total cost functions. 


Table 13.1 Estimated Residuals from the Linear, Quadratic, and Cubic Total Cost Functions 


a a a 


Observation Uj, ; Uj, uj, 


Number Linear Model* Quadratic Model . Cubic Model** 
1 6.600 —23.900 —0.222 
2 19.667 9.500 1.607 
3 1E7282 18.817 —0.915 
4 —2.200 13.050 —4.426 
5 —9.133 11.200 4.435 
6 —26.067 —5.733 1.032 
7 —32.000 —16.750 - 0.726 
8 —28.933 —23.850 —4.119 
9 4.133 o 1.859 ~ 
10 54.200 23.700 0.022 


*Î = 166.467 + 19.933X; 


R = 0.8409 


(19.021) (3.066) R? = 0.8210 
(8.752) (6.502) d=0.716 

tP = 222.383 — 8.0250%,+ 2.542x? R? = 0.9284 

(23.488) (9.809) (0.869) 7 R? = 0.9079 
(9.468) (—0.818) (2.925) d= 1.038 

**P, = 141.767 + 63.478X; — 12.962X7 + 0.939X? R? = 0.9983 

(6.375) (4.778) (0.9856) (0.0592) R? = 0.9975 


(22.238) (13.285) (13.151) (15.861) d=2.70 


“correlation” in the residuals, for the computed d is less than dy. For the cubic cost function, the true specifi- 
cation, the estimated d value does not indicate any positive “correlation” in the residuals.” 


72in the present context, a value of d = 2 will mean no specification error. (Why?) 


Econometric Modeling: Model Specification and Diagnostic Testing 503 


The observed positive “correlation” in the residuals when we fit the linear or quadratic model is not 
a measure of (first-order) serial correlation but of (model) specification error(s). The observed correlation 
simply reflects the fact that some variable(s) that belongs in the model is included in the error term and needs 
to be culled out from it and introduced in its own right as an explanatory variable: If we exclude the X? from 
the cost function, then as Eq. (13.2.3) shows, the error term in the mis-specified model (13.2.2) is in fact 
(uy, + 84X7) and it will exhibit a systematic pattern (e.g., positive autocorrelation) if X? in fact affects Y 
signiifteantly. 

To use the Durbin—Watson test for detecting model specification error(s), we proceed as follows: 


1. From the assumed model, obtain the ordinary least squares (OLS) residuals. 

2. If it is believed that the assumed model is mis-specified because it excludes a relevant explanatory 
variable, say. Z from the model, order the residuals obtained in Step 1 according to increasing values of Z. 
Note: The Z variable could be one of the X variables included in the assumed model or it could be some 
function of that variable, such as X? or X’. 

3. Compute the d statistic from the residuals thus ordered by the usual d formula, namely, 


Denne T n) 

Drei ür 
Note: The subscript f is the index of observation here and does not necessarily mean that the data are time 
series. 

4. From the Durbin—Watson tables, if the estimated d value is significant, then one can accept the 
hypothesis of model mis-specification. If that turns out to be the case, the remedial measures will naturally 
suggest themselves. 

In our cost example, the Z ( = X) variable (output) was already ordered.’ Therefore, we do not have to 
compute the d statistic afresh. As we have seen, the d statistic for both the linear and quadratic cost functions 
suggests specification errors. The remedies are clear: Introduce the quadratic and cubic terms in the linear 
cost function and the cubic term in the quadratic cost function. In short, run the cubic cost model. 


a= 


Ramsey’s RESET Test 


Ramsey has proposed a general test of specification error called RESET (regression specification error test).~4 
Here we will illustrate only the simplest version of the test. To fix ideas, let us continue with our cost-output 
example and assume that the cost function is linear in output as 


Y; = Ay + àX; + Uzi ~ (13.4.6) 


where Y = total cost and X = output. Now if we plot the residuals vu, obtained from this regression against 
Y;, the estimated Y. from this model, we get the picture shown in Figure 13.2. Although }_ ĝ, and Y a, Ý; 
are necessarily zero (why? see Chapter 3), the residuals in this figure show a pattern in which their mean 
changes systematically with Y,. This would suggest that if we introduce Ý; in some form as a regressor(s) in 
Eq. (13.4.6), it should increase R°. And if the increase in R? is statistically significant (on the basis of the F 
test discussed in Chapter 8), it would suggest that the linear cost function (13.4.6) was mis-specified. This is 
essentially the idea behind RESET. The steps involved in RESET are as follows: 


23it does not matter if we order û; according to X? or X? since these are functions of X, which is already ordered. 


24) B. Ramsey, “Tests for Specification Errors in Classical Linear Least Squares Regression Analysis,” Journal of the Royal 
Statistical Society, series B, vol. 31, 1969, pp. 350-371. 
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B 


Figure 13.2 Residuals 4; and estimated Y from the linear cost function: Y, = A, + A, + 4, 


1. From the chosen model, e.g., Eq. (13. 4.6), obtain the estimated Y,, that is, P 
2. Rerun Eq. (13.4.6) introducing y, in some form as an additional regressor(s). From Figure 13.2, we 


observe that there is a curvilinear relationship between u; and Ê, suggesting that one can introduce e and si 
as additional regressors. Thus, we run 
Y, = Bi + PX; + BsY? + Ba¥? + ui “~ 
3. Let the R? obtained from Eq. (13.4.7) be R?„„ and that obtained from Eq. (13.4.6) be Re q: Then we can 
use the F test first introduced in Eq. (8.4.18), ay 
R? „ — R24) /number of new regressors 
pis (Riev — Raa) / a (8.4.18) 
(1 — R2.,) Ic (n — number of parameters in the new model) 
to find out if the increase in R? from using Eq. (13.4.7) is statistically significant. 
4. If the computed F value is significant, say, at the 5 percent level, one can accept the hypothesis that the 


model (13.4.6) is mis-specified. 
Returning to our illustrative example, we have the following results (standard errors in parentheses): 


f; = 166.467 + 19.933X; 
(13.4.8) 
(19.021) (3.066) R? = 0.8409 
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Ê; = 2140.7223 + 476.6557X; — 0.09187? + 0.000119f? 
(132.0044) (33.3951) (0.00620)  (0.0000074) (13.4.9) 


R? = 0.9983 


Note: Ŷ? and Ŷ? in Eq. (13.4.9) are obtained from Eq. (13.4.8). 
Now applying the F test we find 


_ (0.9983 — 0.8409)/2 
~ (1 —0.9983)/(10 — 4) (13.4.10) 


= 284.4035 


The reader can easily verify that this F value is highly significant, indicating that the model (13.4.8) is 
mis-specified. Of course, we have reached the same conclusion on the basis of the visual examination of the 
residuals as well as the Durbin—Watson d value. It should be added that, since y, is estimated, it is a random 
variable and, therefore, the usual tests of significance apply if the sample is reasonably large. 

One advantage of RESET is that it is easy to apply, for it does not require one to specify what the alter- 
native model is. But that is also its disadvantage because knowing that a model is mis-specified does not help 
us necessarily in choosing a better alternative. 

As one author notes: 


In practice, the RESET test may not be particularly good at detecting any specific alternative to a proposed model, 
and its usefulness lies in acting as a general indicator that something is wrong. For this reason, a test such as 
RESET is sometimes described as a test of misspecification, as opposed to a test of specification. This distinction 
is rather subtle, but the basic idea is that a specification test looks at some particular aspect of a given equation, 
with clear null and alternative hypotheses in mind. A misspecification test, on the other hand, can detect a range 
of alternatives and indicate that something is wrong under the null, without necessarily giving clear guidance as to 
what alternative hypothesis is appropriate.” 


Lagrange Multiplier (LM) Test for Adding Variables 


This is an alternative to Ramsey’s RESET test. To illustrate this test, we will continue with the preceding 
illustrative example. 

If we compare the linear cost function (13.4.6) with the cubic cost function (13.4.4), the former is a 
restricted version of the latter (recall our discussion of restricted least squares from Chapter 8). The 
restricted regression (13.4.6) assumes that the coefficients of the squared and cubed output terms are equal to 
zero. To test this, the LM test proceeds as follows: 

1. Estimate the restricted regression (13.4.6) by OLS and obtain the residuals, ù;. 

2. If in fact the unrestricted regression (13.4.4) is the true regression, the residuals obtained in Eq. (13.4.6) 
should be related to the squared and cubed output terms, that is, X 2 and X?. 

3. This suggests that we regress the üi obtained in Step 1 on all the regressors (including those in the 
restricted regression), which in the present case means 


thy = ary + 2X; +03X? + 04X} + vi (13.4.11) 


where v is an error term with the usual properties. 
4. For large-sample size, Engle has shown that n (the sample size) times the R? esiimated from the 
(auxiliary) regression (13.4.11) follows the chi-square distribution with df equal to the number of restrictions 


25jon Stewart and Len Gill, Econometrics, 2d ed., Prentice-Hall Europe, 1998, p. 69. 
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imposed by the restricted regression, two in the present example since the terms X, 2 and X; a are dropped from 
the model.” Symbolically, we write 


n R? si Keser of restrictions) (13.4.12) 


where asy means asymptotically, that is, in large samples. 

5. If the chi-square value obtained from Eq. (13.4.12) exceeds the critical chi-square value at the chosen 
level of significance, we reject the restricted regression. Otherwise. we do not reject it: 

For our example, the regression results are as follows: 


Y, = 166.467 + 19.333X; (13.4.13) 


where Y is total cost and X is output. The standard errors for this regression are already given in Table 13.1. 
When the residuals from Eg. (13.4.13) are regressed as just suggested in Step 3, we obtain the following 
results: 


i; =-24.7 + 43.5443X; — 12.9615X? + 0.9396X2 
se= (6.375) (4.779) (0.986) (0.059) (13.4.14) 


R? = 0.9896 


Although our sample size of 10 is by no means large, just to illustrate the LM mechanism, we obtain nR? 
= (10)(0.9896) = 9.896. From the chi-square table we observe that for 2 df the 1 percent critical chi-square 
value is about 9.21. Therefore, the observed value of 9.896 is significant at the 1 percent level, and our 
conclusion would be to reject the restricted regression (i.e., the linear cost function). We reached a similar 
conclusion on the basis of Ramsey’s RESET test. 


13.5 Errors of Measurement 


All along we have assumed implicitly that the dependent variable Y and the explanatory variables, the X's, 
are measured without any errors. Thus, in the regression of consumption expenditure on income and wealth 
of households, we assume that the data on these variables are “accurate”: they are not guess estimates, 
extrapolated, interpolated, or rounded off in any systematic manner, such as to the nearest hundredth dollar. 
and so on. Unfortunately, this ideal is not met in practice for a variety of reasons, such as nonresponse 
errors, reporting errors, and computing errors. Whatever the reasons, error of measurement is a potentially 
troublesome problem, for it constitutes yet another example of specification bias with the consequences noted 
below. 


Errors of Measurement in the Dependent Variable Y 


Consider the following model: 


Y* =a + BX; t+ üi (13.5.1) 
where Y;* = permanent consumption expenditure” 
X; = current income 
u; = stochastic disturbance term 


?6R, F. Engle, “A General Approach to Lagrangian Multiplier Model Diagnostics,” Journal of Econometrics, vol. 20, 1982, pp. 
83-104. 


?7This phrase is due to Milton Friedman. See also Exercise 13.8. 
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Since Y* is not directly measurable, we may use an observable expenditure variable Y, such that 
Y; = Yi +6; (13.5.2) 
where e, denote errors of measurement in }"*. Therefore, instead of estimating Eq. (13.5.1), we estimate 


Y; = (a + BX; + ui) +e 
=a + BX; + (ui +6) (13.5.3) 


=a + BX; + vi 
where v, =u, + €, IS a composite error term. containing the population disturbance term (which may be called 
the equation error term) and the measurement error term. 

For simplicity assume that Etu) = Ete) = 0. cov (X; u,) = 0 (which is the assumption of the classical linear 
regression), and cov (X,, €,) = 0; that is, the errors of measurement in Y; are uncorrelated with X, and cov 
(u,, €,) = 0: that is, the equation error and the measurement error are uncorrelated. With these assumptions, 
it can be seen that B estimated trom either Eq. (13.5.1) or Eq. (13.5.3) will be an unbiased estimator of the 
true B (see Exercise 13.7): that is. the errors of measurement in the dependent variable Y do not destroy the 
unbiasedness property of the OLS estimators. However, the variances and standard errors of B estimated from 
Eqs. (13.5.1) and (13.5.3) will be different because, employing the usual formulas (see Chapter 3), we obtain 


2 


Model (13.5.1): var (Ê) = = 5 (13.5.4) 
a ae 
Model (13.5.3): var (B) = Saar 


< Ga ap a. 
= 2 
2x 
Obviously. the latter variance is larger than the former. ™ Therefore, although the errors of measurement 


in the dependent variable still give unbiased estimates of the parameters and their variances, the 
estimated variances are now larger than in the case where there are no such errors of measurement. 


(13.5.5) 


Errors of Measurement in the Explanatory Variable X 


Now assume that instead of Eq. (13.5.1), we have the following model: 
Mu! pA a 1 (13.5.6) 
where Y, = current consumption expenditure 
Xý = permanent income 
u; = disturbance term (equation error) 
Suppose instead of observing X7, we observe 
Xi = Xf + wi (13.5.7) 


where w; represents errors of measurement in 47. Therefore, instead of estimating Eq. (13.5.6), we estimate 
Y; =a + B(X; — wi) + ui 
=a+ BX; + (u: — Bwi) (13.5.8) 


a eet BX, eile 
where z; = u; — Bw;, a compound of equation and measurement errors. 


28g ut note that this variance is still unbiased because under the stated conditions the composite error term v, = U, + £; still 
satisfies the assumptions underlying the method of least squares. 
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Now even if we assume that w; has zero mean, is serially independent, and is uncorrelated with u;, we 
can no longer assume that the composite error term z; is independent of the explanatory variable X; because 
(assuming E [z;] = 0) 

cov (z;, X;) = Elz; — E(zi) [Xi — E(Xi)] 
= E(u; — Bw;)(wi) using (13.5.7) 
= E(—Bw?) (13.5.9) 


= — po; 
Thus, the explanatory variable and the error term in Eq. (13.5.8) are correlated, which violates the crucial 
assumption of the classical linear regression model that the explanatory variable is uncorrelated with the 
stochastic disturbance term. If this assumption is violated, it can be shown that the OLS estimators are not 
only biased but also inconsistent, that is, they remain biased even if the sample size n increases indefinitely. 29 
For model (13.5.8), it is shown in Appendix 13A, Section 13A.3 that 


1 


I 13.5.10 
1 +0o2/o}. ( ) 


plim b= | 
where o2 and 03. are variances of w; and X”, respectively, and where plim Ê means the probability limit of £. 

Since the term inside the brackets is expected to be less than 1 (why?), Eq. (13.5.10) shows that even 
if the sample size increases indefinitely, B will not converge to B. Actually, if B is assumed positive, B 
will underestimate B, that is, it is biased toward zero. Of course, if there are no measurement errors in 
X(i.e., 02 = 0), B will provide a consistent estimator of £. 

Therefore, measurement errors pose a serious problem when they are present in the explanatory variable(s) 
because they make consistent estimation of the parameters impossible. Of course, as we saw, if they are 
present only in the dependent variable, the estimators remain unbiased and hence they are consistent too. If 
errors of measurement are present in the explanatory variable(s), what is the solution? The answer is not easy. 
At one extreme, we can assume that if oĉ is small compared to o a , for all practical purposes we can “assume 
away” the problem and proceed with the usual OLS estimation. Of course, the rub here is that we cannot 
readily observe or measure o% and aes and therefore there is no way to judge their relative magnitudes. 

One other suggested remedy is the use of instrumental or proxy variables that, although highly corre- 
lated with the original X variables, are uncorrelated with the equation and measurement ertor terms (i.e., u; 
and w,). If such proxy variables can be found, then one can obtain a consistent estimate of 8. But this task 
is much easier said than done. In practice it is not easy to find good proxies; we are often in the situation of 
complaining about the bad weather without being able to do much about it. Besides, it is not easy to find out 
if the selected instrumental variable is in fact independent of the error terms u; and w;. 

In the literature there are other suggestions to solve the problem.*? But most of them are specific to 
the given situation and are based on restrictive assumptions. There is really no satisfactory answer to the 
measurement errors problem. That is why it is so crucial to measure the data as accurately as possible. 


2°as shown in Appendix A, Ê is a consistent estimator of B if, as n increases indefinitely, the sampling distribution of Ê 
will ultimately collapse to the true £. Technically, this is stated as plim,_...8 = £. As noted in Appendix A, consistency is 
a large-sample property and is often used to study the behavior of an estimator when its finite or small-sample properties 
(e.g., unbiasedness) cannot be determined. 


30See Thomas B. Fomby, R. Carter Hill, and Stanley R. Johnson, Advanced Econometric Methods, Springer-Verlag, New York, 
1984, pp. 273-277. See also Kennedy, op. cit., pp. 138-140, for a discussion of weighted regression as well as instrumental 
variables. See also: G. S. Maddala, Introduction to Econometrics, 3d ed., john Wiley & Sons, New York, 2001, pp. 437-462, 
and Quirino Paris, “Robust Estimators of Errors-in-Variables Models: Part |,” Working Paper No. 04-007, 200, Department 
of Agricultural and Resource Economics, University of California at Davis, August 2004. 


Example 13.2 An Example 


We conclude this section with an example constructed to highlight the preceding points. 
Table 13.2 gives hypothetical data on true consumption expenditure Y, true income X*, measured 
consumption Y, and measured income X. The table also explains how these variables were measured. >! 
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Measurement Errors in the Dependent Variable Y Only. Based on the given data, the true consumption 


function is 
= 25.00 + 0.6000X? 
(10.477) (0.0584) 
(13.5.11) 
t= (2.3861) (10.276) 
R? = 0.9296 
Table 13.2 Hypothetical Data on }* (True Consumption Expenditure), X* (True Income), Y (Measured Consumption 
Expenditure), and X (Measured Income); All Data in Rupees 
es XŠ Y X € w u 
75.4666 80.00 67.6011 80.0940 —7.8655 0.0940 2.4666 
74.9801 100.00 75.4438 91.5721 0.4636  —8.4279  —10.0199 
102.8242 120.00 109.6956 112.1406 6.8714 2.1406 5.8242 
125.7651 140.00 129.4159 145.5969 3.6509 5.5969 16.7651 
106.5035 160.00 104.2388 168.5579  —2.2647 8.5579  —14.4965 
131.4318 180.00 125.8319 171.4793 -5.5999  —8.5207 —1.5682 
149.3693 200.00 153.9926 © 203.5366 4.6233 3.5366 4.3693 
143.8628 220.00 152.9208 222.8533 9.0579 2.8533  —13.1372 
177.5218 240.00 176.3344 232.9879 -1.1874  —7.0120 8.5218 
182.2748 260.00 174.5252 261.1813 —-—7.7496 1.1813 1.2748 


Note: The data on X* are assumed to be given. In deriving the other variables the assumptions made were as follows: 
(1) E(u) = Ele) = E(w) = 0; (2) cov (X, u) = cov (X, £) = cov (u, £) = cov (w, u) = cov (£, w) = 0; (3) a = 100, o? = 36, and œw = 36; 


and (4) Y* = 254+ 0.6X7 + u; , Y; = Y? + £; and X¥;=X* + w, 


whereas, if we use Y; instead of Y;*, we obtain 
¥; = 25.00 + 0.6000X; 


(12.218) (0.0681) 
t= (2.0461) (8.8118) 


R? = 0.9066 
As these results show, and according to the theory, the estimated coefficients remain the same. The only 
effect of errors of measurement in the dependent variable is that the estimated standard errors of the coeffi- 
cients tend to be larger (see Eq. [13.5.5]), which is clearly seen in Eq. (13.5.12). In passing, note that the 
regression coefficients in Eqs. (13.5.11) and (13.5.12) are the same because the sample was generated to 
match the assumptions of the measurement error model. 


(13.5.12) 


311 am indebted to Kenneth J. White for constructing this example. See his Computer Handbook Using SHAZAM, for use with 
Damodar Gujarati, Basic Econometrics, September 1985, pp. 117-121. 
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Errors of Measurement in X. We know that the true regression is Eq. (1 3.5.11). Suppose now that instead 
of using X* we use X;. (Note: In reality X7 is rarely observable.) The regression results are as follows: 


Y*—= 25.992 + 0.5942X; 
(11.0810) (0.0617) 
t= (2.3457) (9.6270) 


R? = 0.9205 


These results are in accord with the theory—when there are measurement errors in the explanatory variable(s), 
the estimated coefficients are biased. Fortunately, in this example the bias is rather small—from Eq. (13.5.10) 
it is evident that the bias depends on 9,,/0%+, and in generating the data it was assumed that o7 = 36 and 
o%+ = 3667, thus making the bias factor rather small, about 0.98 percent (= 36/3667). 

We leave it to the reader to find out what happens when there are errors of measurement in both Y and X, 
that is, if we regress Y, on X; rather than Y*on X;* (see Exercise 13.23). 


(13.5.13) 


13.6 incorrect Specification of the Stochastic Error Term 


A common problem facing a researcher is the specification of the error term u; that enters the regression 
model. Since the error term is not directly observable, there is no easy way to determine the form in which it 
enters the model. To see this, let us return to the models given in Eqs. (13.2.8) and (13.2.9). For simplicity of 
exposition, we have assumed that there is no intercept in the model. We further assume that u; in Eq. (13.2.8) 
is such that In u; satisfies the usual OLS assumptions. 

If we assume that Eq. (13.2.8) is the “correct” model but estimate Eq. (13.2.9), what are the consequences? 
It is shown in Appendix 13.A, Section 13A.4, that if In u; ~ N(0, o°), then 


u; ~ lognormal [e”’/?, e” (e° — 1)] “a (13.6.1) 
As a result, 
E(&) = pe?’ (13.6.2) 
where e is the base of the natural logarithm. 
As you can see, ĝ is a biased estimator, as its average value is not equal to the true 8. We will have more to 


say about the specification of the stochastic error term in the chapter on nonlinear-in-the-parameter regression 
models. 


13.7 Nested versus Non-Nested Models 


In carrying out specification testing, it is useful to distinguish between nested and non-nested models. To 
distinguish between the two, consider the following models: 


Model A: Y; = By + BoX; + B3X3i + BaX4 + BsX5i + ui 


Model B: Y; = By + BoX2; + B3X3; + ui 


We say that Model B is nested in Model A because it is a special case of Model A: If we estimate Model A 
and test the hypothesis that B, = B; = 0 and do not reject it on the basis of, say, the F test.** Model A reduces 


*2More generally, one can use the likelihood ratio test, or the Wald test or the Lagrange Multiplier test, which were 
discussed briefly in Chapter 8. 
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to Model B. If we add variable X, to Model B, then Model A will reduce to Model B if Bs is zero: here we 
will use the ż test to test the hypothesis that the coefficient of X; is zero. 

Without calling them such, the specification error tests that we have discussed previously and the restricted 
F test that we discussed in Chapter 8 are essentially tests of nested hypothesis. 

Now consider the following models: 


Model C: Y; = a + 2X27; + 3X3; +4; 


Model D: Y; = bi + BoZ2; + B3Z3; + vi 


where the X°s and Z’s are different variables. We say that Models C and D are non-nested because one cannot 
be derived as a special case of the other. In economics, as in other sciences, more than one competing theory 
may explain a phenomenon. Thus, the monetarists would emphasize the role of money in explaining changes 
in GDP, whereas the Keynesians may explain them by changes in government expenditure. 

It may be noted here that one can allow Models C and D to contain regressors that are common to both. 
For example, X, could be included in Model D and Z, could be included in Model C. Even then these are 
non-nested models. because Model C does not contain Z} and Model D does not contain X>. 

Even if the same variables enter the model, the functional form may make two models non-nested. For 
example, consider the model: 


Model E: Y; = Bi + b2 ln Zz; + B3 In Z3; + wi 
Models D and E are non-nested. as one cannot be derived as a special case of the other. 
Since we already have looked at tests of nested models (z and F tests). in the following section we discuss 
some of the tests of non-nested models, which earlier we called model mis-specification errors. 


13.8 Tests of Non-Nested Hypotheses 


According to Harvey,** there are two approaches to testing non-nested hypotheses: (1) the discrimination 
approach, where given two or more competing models, one chooses a model based on some criteria of 
goodness of fit, and (2) the discerning approach (our terminology) where, in investigating one model, we 
take into account information provided by other models. We consider these approaches briefly. 


The Discrimination Approach 


Consider Models C and D in Section 3.7. Since both models involve the same dependent variable. we can 
choose between two (or more) models based on some goodness-of-fit criterion, such as R° or adjusted R°. 
which we have already discussed. But keep in mind that in comparing two or more models, the regressand 
must be the same. Besides these criteria, there are other criteria that are also used. These include Akaike’s 
information criterion (AIC), Schwarz’s information criterion (SIC), and Mallows’s C, criterion. We 
discuss these criteria in Section 13.9. Most modern statistical software packages have one or more of these 
criteria built into their regression routines. In the last section of this chapter, we will illustrate these criteria 
using an extended example. On the basis of one or more of these criteria a model is finally selected that has 
the highest R2 or the lowest value of AIC or SIC, etc. 


33 andrew Harvey, The Econometric Analysis of Time Series, 2d ed., The MIT Press, Cambridge, Mass., 1990, Chapter 5. 
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The Discerning Approach 


The Non-Nested F Test or Encompassing F Test 


Consider Models C and D introduced in Section 3.7. How do we choose between the two models? For this 
purpose suppose we estimate the following nested or hybrid model: 


Model F: Y; = Ay +A2X3; + A3X3; + AgZ2; + À5 Z3; + ui 


Notice that Model F nests or encompasses Models C and D. But note that C is not nested in D and D is not 
nested in C, so they are non-nested models. 

Now if Model C is correct, A, = A; = 0, whereas Model D is correct if A, = A3 = 0. This testing can be done 
by the usual F test, hence the name non-nested F test. 

However, there are problems with this testing procedure. First, if the X’s and the Z’s are highly correlated, 
then, as noted in the chapter on multicollinearity, it is quite likely that one or more of the A’s are individually 
statistically insignificant, although on the basis of the F test one can reject the hypothesis that all the slope 
coefficients are simultaneously zero. In this case, we have no way of deciding whether Model C or Model D is 
the correct model. Second, there is another problem. Suppose we choose Model C as the reference hypothesis 
or model, and find that all its coefficients are significant. Now we add Z, or Z, or both to the model and find, 
using the F test, that their incremental contribution to the explained sum of squares (ESS) is statistically 
insignificant. Therefore, we decide to choose Model C. 

But suppose we had instead chosen Model D as the reference model and found that all its coefficients 
were statistically significant. But when we add X, or X; or both to this model, we find, again using the F test, 
that their incremental contribution to ESS is insignificant. Therefore, we would have chosen model D as the 
correct model. Hence, “the choice of the reference hypothesis could determine the outcome of the choice 
model,”*4 especially if severe multicollinearity is present in the competing regressors. Finally, the artificially 
nested model F may not have any economic meaning. 


Example 13.3 An Illustrative Example: The St. Louis Model 


To determine whether changes in nominal GNP can be explained by changes in the money supply (monetarism) 
or by changes in government expenditure (Keynesianism), we consider the following models: 


Ye =a + BoMe + Bi Mii + B2 Me-2 + B3 Mi-3 + BaMr_4 + Une 


4 ; (13.8.1 
=a + J BiMri + une } 
i=0 
Ye=y tro Èt +1 Eta taz2 Er2t+az E, 3+ Ate peas 
(13.8.2) 


4 
=y¥+ oa Éti +u2t 
i=0 
where Y, = rate of growth in nominal GNP at time t 
M; = rate of growth in the money supply (M, version) at time t 
È, = rate of growth in full, or high, employment government expenditure at time t 
In passing, note that Eqs. (13.8.1) and (13.8.2) are examples of distributed-lag models, a topic 
thoroughly discussed in Chapter 17. For the time being, simply note that the effect of a unit change in the 


4Thomas B. Fomby, R. Carter Hill, and Stanley R. Johnson, Advanced Econometric Methods, Springer Verlag, New York, 1984, 
p. 416. 
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money supply or government expenditure on GNP is distributed over a period of time and is not instanta- 
neous. 


Since a priori it may be difficult to decide between the two competing models, let us enmesh the two 
models as shown below: 


4 4 
Yı = constant + È` Bj Mri + Soa Ei + use (13.8.3) 
i=0 i=0 
This nested model is one form in which the famous (Federal Reserve Bank of) St. Louis model, a pro-monetary- 
school bank, has been expressed and estimated. The results of this model for the period 1953-1 to 1976-IV for 
the United States are as follows (t ratios in parentheses):>> 


Coefficient Estimate Coefficient Estimate 


Bo 0.40 (2.96) do 0.08 (2.26) 

Bi 0.41 (5.26) M 0.06 (2.52) 

B2 0.25 (2.14) Àz 0.00 (0.02) 

B3 0.06 (0.71) 3 —0.06 (—2.20) 

Ba —0.05 (—0.37) dig —0.07 (—1.83) (13.8.4) 
4 4 

DD 1.06 (5.59) wa Ai 0.03 (0.40) 

i=0 i=0 

R? = 0.40 


What do these results suggest about the superiority of one model over the other? If we consider the cumulative 
effect of a unit change in M and £ on Ý, we obtain, respectively, )~*_ 6; = 1.06 and Sfo Ai = 0.03, the 
former being statistically significant and the latter not. This comparison would tend to support the monetarist 
claim that it is changes in the money supply that determine changes in the (nominal) GNP. It is left as an 
exercise for the reader to critically evaluate this claim. 


Davidson-MacKinnon J Test” 


Because of the problems just listed in the non-nested F testing procedure, alternatives have been suggested. 
One is the Davidson—MacKinnon J test. To illustrate this test, suppose we want to compare hypothesis or 
Model C with hypothesis or Model D. The J test proceeds as follows: 

1. We estimate Model D and from it we obtain the estimated Y values, RP ; 

2. We add the predicted Y value in Step 1 as an additional regressor to Model C and estimate the following 
model: 


¥; = a +07X>j +03X3; +a4¥? +; (13.8.5) 
where the pe values are obtained from Step |. This model is an example of the encompassing principle, as 


in the Hendry methodology. 
3. Using the ż test, test the hypothesis that a, = 0. 


35See Keith M. Carlson, “Does the St. Louis Equation Now Believe in Fiscal Policy?” Review, Federal Reserve Bank of St. Louis, 
vol. 60, no. 2, February 1978, p. 17, table IV. 

36 Davidson and J. C. MacKinnon, “Several Tests for Model Specification in the Presence of Alternative Hypotheses,” 
Econometrica, vol. 49, 1981, pp. 781-793. 
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4. If the hypothesis that a, = 0 is not rejected, we can accept (i.e., not reject) Model C as the true model 
because ve included in Eq. (13.8.5), which represents the influence of variables not included in Model C, has 
no additional explanatory power beyond that contributed by Model C. In other words, Model C encompasses 
Model D in the sense that the latter model does not contain any additional information that will improve the 
performance of Model C. By the same token, if the null hypothesis is rejected, Model C cannot be the true 
model (why?). 

5. Now we reverse the roles of hypotheses, or Models C and D. We now estimate Model C first, use the 
estimated Y values from this model as the regressor in Eq. (13.8.5), repeat Step 4, and decide whether to 
accept Model D over Model C. More specifically, we estimate the following model: 


Y; = By + BoZoj + BsZsi + BaY f + ui EES) 


where re are the estimated Y values from Model C. We now test the hypothesis that 8, = 0. If this hypothesis 
is not rejected, we choose Model D over C. If the hypothesis that B, = 0 is rejected, we choose C over D, as 
the latter does not improve over the performance of C. 

Although it is intuitively appealing, the J test has some problems. Since the tests given in Eqs. (13.8.5) and 
(13.8.6) are performed independently, we have the following likely outcomes: 


Hypothesis: a4 = 0 


Hypothesis: B4 = 0 Do Not Reject Reject 
Do not reject Accept both C and D Accept D, reject C 
Reject Accept C, reject D Reject both C and D 


As this table shows, we will not be able to get a clear answer if the J testing procedure leads to the acceptance 
or rejection of both models. In case both models are rejected, neither model helps us to explain the behavior 
of Y. Similarly, if both models are accepted, as Kmenta notes, “the data are apparently not rich enough to 
discriminate between the two hypotheses [models]. 

Another problem with the J test is that when we use the f statistic to test the significance of the estimated 
Y variable in models (13.8.5) and (13.8.6), the ż statistic has the standard normal distribution only asymptoti- 
cally, that is, in large samples. Therefore, the J test may not be very powerful (in the statistical sense) in small 
samples because it tends to reject the true hypothesis or model more frequently than it ought to. 


rw 


Example 13.4 Private Final Consumption Expenditure and Disposable Personal Income 


To illustrate the /-test, consider the data given in Table 13.3. This table gives data on personal disposable 
income (PDI) and private final consumption expenditure (PFCE), both measured in crores of rupees in 
1999-2000 prices for the period 1951-52 to 2004-05. Consider the following rival models: 


Model A: PFCE, = a, + a PDI, + a3 PDI,_, + u; (13.8.7) 
Model B: PFCE,= 8, + B2 PDI, + B3 PFCE,,+u, ` (13.8.8) 
Model A states that PFCE depends on PDI in the current and previous time period; this model is an example 
of what is known as the distributed-lag model (see Chapter 17). Model B postulates that PFCE depends on 
current PDI as well as PFCE in the previous time period; this model represents what is known as the autore- 


gressive model (see Chapter 17 again). The reason for introducing the lagged value of PFCE in this model 
is to reflect inertia or habit persistence. 


37an Kmenta, op. cit., p. 597. 
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The results of estimating these models separately were as follows: 


Model A: PFCE, = 380587.481 + 2.466 PDI, - 2.131 PDI, 
t=(22.475) (4.221)  (-3.314) (13.8.9) 
R? = 0.9313 d=0.366 
Model B: PFCE, = 671.203 + 0.014 PDI, + 1.032 PFCE,_, 
t = (0.089) (1.373) (54.584) (13.8.10) 
R? =0.9986 d= 2.692 


Table 13.3 Private Final Consumption Expenditure (PFCE) and Personal Disposable Income (PDI) in 
1999-2000 (both in Rs. Crore) 


Year PDI PFCE Year PDI PFCE 
1951-52 9,298 213,872 1978-79 91,507 509,819 
1952-53 9,237 222,503 1979-80 99,632 498,384 
1953-54 10,125 235,879 1980-81 123,067 543,243 
1954-55 9,392 243,617 1981-82 142,181 566,866 
1955-56 9,794 245,946 1982-83 157,291 572,536 
1956-57 11,692 256,826 1983-84 185,749 616,974 
1957-58 11,933 251,753 1984-85 207,491 634,757 
1958-59 13,362 274,864 1985-86 229,527 661,249 
1959-60 13,971 277,991 1986-87 256,413 682,116 
1960-61 14,983 293,804 1987-88 291,585 705,495 
1961-62 15,719 298,813 1988-89 345,011 749,530 
1962-63 16,698 302,706 1989-90 395,239 786,725 
1963-64 19,077 313,966 1990-91 465,097 821,863 
1964-65 22,515 332,722 1991-92 531,515 839,593 
1965-66 23,569 333,017 1992-93 618,587 861,245 
1966-67 26,957 337,344 1993-94 716,964 898,682 
1967-68 31,931 356,429 1994-95 842,261 942,359 
1968-69 33,692 365,792 1995-96 959,733 999,729 
1969-70 36,797 379,378 1996-97 1,145,206 1,077,445 
1970-71 38,898 392,262 1997-98 1,263,982 1,109,656 
1971-72 41,151 399,894 1998-99 1,474,404 1,181,797 
1972-73 45,523 402,573 1999-00 1,617,965 1,253,643 
1973-74 55,923 412,452 2000-01 1,773,250 1,292,986 
1974-75 64,968 412,141 2001-02 1,954,839 1,367,758 
1975-76 69,233 435,546 2002-03 2,064,839 1,397,069 
1976-77 73,824 444,231 2003-04 2,282,148 1,493,871 
1977-78 85,267 480,455 2004-05 2,495,015 1,579,255 


Source: Handbook of Statistics on Indian Economy (2006), Reserve Bank of India, GOI and National Accounts Statistics (2000, 2007, 
2009), Central Statistical Organization, GOI 


If one wants to choose between these two models on the basis of the discrimination approach, using the 
highest R? criterion, one would probably choose Model B (13.8.10) because it is just slightly higher than 
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Model A (13.8.9), but in Model B only past PFCE is statistically significant (there might be a collinearity 
problem, though), whereas in Model A both variables are individually statistically significant. For predictive 
purposes, there is not much difference between the two estimated R? values though. 

To apply the j-test, suppose we assume that Model A is the null hypothesis, or the maintained model, and 
Model B is the alternative hypothesis. Following the j-test steps discussed earlier, we use the estimated PFCE 
values from model (13.8.10) as an additional regressor in Model A. The following is the outcome from this 
regression: 

PFCE, = 8800.235 + 0.371 PDI, - 0.392 PDI,_, + 0.972 PFCE? 
t=(1.341) (4.783)  (-4.812) (59.461) (13.8.11) 
` R? = 0.9990 d= 2.371 
where PFCE® on the right-hand side of Eq. (13.8.11) represents the estimated PFCE values from the original 
Model B (13.8.10). Since the coefficient of this variable is statistically significant with a very high t-statistic of 
59.46, following the j-test procedure we have to reject Model A in favor of Model B. 
Now we will assume Model B is the maintained hypothesis and Model A is the alternative. Following the 
exact same procedure, we obtain the following results: 
PFCE, = -60493.52 — 0.069 PDI, + 0.002 PFCE,_, + 0.184 PFCE/ 
t= (-4.265)  (-3.628) (59.461) (4.812) (13.8.12) 
R? =0.9990 d=2.371 
where PFCE* on the right-hand side of Eq. (13.8.12) represents the estimated PFCE values from the original 
Model A (13.8.9). In this regression, the coefficient of PFCE# is also statistically significant with a t-statistic 
of 4.81. This result suggests that we should now reject Model B in favor of Model A. 

Ail this tells us is that neither model is particularly useful in explaining the behavior of private finale 
consumption expenditure in India over the period 1950-51 to 2004-05. Of course, we have considered only 
two competing models. In reality, there may be more than two models. The /-test procedure can be extended 
to multiple model comparisons, although the analysis can quickly become complex. 

This example shows very vividly why the CLRM assumes that the regression model used in the analysis 
is correctly specified. Obviously, in developing a model it is crucial to pay very careful attention to the 
phenomenon being modeled. 


Other Tests of Model Selection 


The J test just discussed is only one of a group of tests of model selection. There is the Cox test, the JA test, 
the P test, the Mizon—Richard encompassing test, and variants of these tests. Obvious]y. we cannot hope 
to discuss these specialized tests, for which the reader may want to consult the references cited in the various 
footnotes.°® 


13.9 Model Selection Criteria 


In this section we discuss several criteria that have been used to choose among competing models and/or to 
compare models for forecasting purposes. Here we distinguish between in-sample forecasting and out-of- 
sample forecasting. In-sample forecasting essentially tells us how the chosen model fits the data in a given 
sample. Out-of-sample forecasting is concerned with determining how a fitted model forecasts future values 
of the regressand, given the values of the regressors. 

Several criteria are used for this purpose. In particular, we discuss these criteria: (1) R°, (2) adjusted 
R?( = R?),„ (3) Akaike’s information criterion (AIC), (4) Schwarz’s information criterion (SIC), (5) Mallows’s 


*8See also Badi H. Baltagi, Econometrics, Springer, New York, 1998, pp. 209-222. 
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C criterion, and (6) forecast x7 (chi-square). All these criteria aim at minimizing the residual sum of squares 
(RSS) (or increasing the R“ value). However, except for the first criterion, criteria (2), (3), (4), and (5) impose 
a penalty for including an increasingly large number of regressors. Thus there is a trade-off between goodness 
of fit of the model and its complexity (as judged by the number of regressors). 


The R? Criterion 


We know that one of the measures of goodness of fit of a regression model is R*, which, as we know, is 
defined as: 


R- l (13.9.1) 


R°, thus defined, of necessity lies between 0 and 1. The closer it is to |, the better is the fit. But there are 
problems with R*. First, it measures in-sample goodness of fit in the sense of how close an estimated Y 
value is to its actual value in the given sample. There is no guarantee that it will forecast well out-of-sample 
observations. Second, in comparing two or more R”s, the dependent variable, or regressand, must be the 
same. Third, and more importantly, an R7 cannot fall when more variables are added to the model. Therefore, 
there is every temptation to play the game of “maximizing the R*” by simply adding more variables to the 
model. Of course, adding more variables to the model may increase R` but it may also increase the variance 
of forecast error. 


Adjusted R? 


As a penalty for adding regressors to increase the R? value, Henry Theil developed the adjusted R*, denoted 
by R?, which we studied in Chapter 7. Recall that 


5) _ RSS/(n—&) _ m nal 
ka=] Tssa D | (1 RI (13.9.2) 


As you can see from this formula, R? =< R?, showing how the adjusted R? penalizes for adding more regressors. 
As we noted in Chapter 8, unlike R°, the adjusted R? will increase only if the absolute t value of the added 
variable is greater than 1. For comparative purposes, therefore, R? is a better measure than R7. But again keep 
in mind that the regressand must be the same for the comparison to be valid. 


Akaike’s Information Criterion (AIC) 


The idea of imposing a penalty for adding regressors to the model has been carried further in the AIC criterion, 
which is defined as: 


ee, a RSS 
n 


= ein (13.9.3) 


where k is the number of regressors (including the intercept) and n is the number of observations. For mathe- 
matical convenience, Eq. (13.9.3) is written as 


In AIC = (=) + in( <=) (13.9.4) 
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where In AIC = natural log of AIC and 2k/n = penalty factor. Some textbooks and software packages define 
AIC only in terms of its log transform so there is no need to put In before AIC. As you see from this formula, 
AIC imposes a harsher penalty than R? for adding more regressors. In comparing two or more models, 
the model with the lowest value of AIC is preferred. One advantage of AIC is that it is useful for not only 
in-sample but also out-of-sample forecasting performance of a regression model. Also, it is useful for both 
nested and non-nested models. It also has been used to determine the lag length in an AR(p) model. 


Schwarz’s Information Criterion (SIC) 


Similar in spirit to the AIC, the SIC criterion is defined as: 


^2 
e a a (13.9.5) 
n n 
or in log-form: 
k RSS 
In SIC = — İnn + In (=) (13.9.6) 
n 


where [(k/n) In n] is the penalty factor. SIC imposes a harsher penalty than AIC, as is obvious from comparing 
Eq. (13.9.6) to Eq. (13.9.4). Like AIC, the lower the value of SIC, the better the model. Again, like AIC. SIC 
can be used to compare in-sample or out-of-sample forecasting performance of a model. 


Mallows’s C, Criterion 


Suppose we have a model consisting of k regressors, including the intercept. Let 6? as usual be the estimator 
of the true a”. But suppose that we only choose p regressors (p = k) and obtain the RSS from the regression 
using these p regressors. Let RSS, denote the residual sum of squares using the p regressors. Now C. P. 
Mallows has developed the following criterion for model selection, known as the C p criterion: 


RSS 
aml we li a (13.9.7) 


wv 


where n is the number of observations. 
We L that E(ô?7) is an unbiased estimator of the true ø”. Now. if the model with p regressors is 
adequat in that it does not suffer from lack of fit, it can be shown?’ that E(RSS,,) =(n- Do. In consequence, 
it is true approximately that 
(n — p)o? 
m 


E(C,) ~ —~(n—2p) ~ p (13.9.8) 


In choosing a model according to the C, criterion, we would look for a model that has a low C p value, about 


equal to p. In other words, following the principle of parsimony. we will choose a model with p regressors 
(p < k) that gives a fairly good fit to the data. 


3°Norman D. Draper and Harry Smith, Applied Regression Analysis, 3d ed., john Wiley & Sons, New York, 1998, p. 332. See 
this book for some worked examples of C,,. 
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In practice, one usually plots C, computed from Eq. (13.9.7) against p. An “adequate” model will show 
up as a point close to the C, =p line, as can be seen from Figure 13.3. As this figure shows, Model A may be 
preferable to Model B, as it is closer to the C, =p line than Model B. 


C 


Figure 13.3 Mallows’s C, plot. 


A Word of Caution about Model Selection Criteria 


We have discussed several model selection criteria. But one should look at these criteria as an adjunct to the 
various specification tests we have discussed in this chapter. Some of the criteria discussed above are purely 
descriptive and may not have strong theoretical properties. Some of them may even be open to the charge 
of data mining. Nonetheless, they are so frequently used by the practitioner that the reader should be aware 
of them. No one of these criteria is necessarily superior to the others.“ Most modern software packages 
now include R?, adjusted R^, AIC, and SIC. Mallows’s C, is not routinely given, although it can be easily 
computed from its definition. 


Forecast Chi-Square (x°) 


Suppose we have a regression model based on n observations and suppose we want to use it to forecast the 
(mean) values of the regressand for an additional t observations. As noted elsewhere, it is a good idea to save 
part of the sample data to see how the estimated model forecasts the observations not included in the sample, 
the post-sample period. 

Now the forecast y? test is defined as follows: 


n+t a2 
Zenit Hi (13.9.9) 
a 
where ù; is the forecast error made for period i (=n + 1, n + 2, ..., +n + t), using the parameters obtained from 
the fitted regression and the values of the regressors in the post-sample period. 7; is the usual OLS estimator 
of o? based on the fitted regression. 


Forecast, x? = 


40For a useful discussion on this topic, see Francis X. Diebold, Elements of Forecasting, 2d ed., South Western Publishing, 
2001, pp. 83-89. On balance, Diebold recommends the SIC criterion. 
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If we hypothesize that the parameter values have not changed between the sample and post-sample periods, 
it can be shown that the statistic given in Eq. (13.9.9) follows the chi-square distribution with ¢ degrees of 
freedom, where t is the number of periods for which the forecast is made. As Charemza and Deadman note, 
the forecast y? test has weak statistical power, meaning that the probability that the test will correctly reject 
a false null hypothesis is low and therefore the test should be used as a signal rather than a definitive test.*! 


13.10 Additional Topics in Econometric Modeling 


As noted in the introduction to this chapter, the topic of econometric modeling and diagnostic testing is so 
vast and evolving that specialized books are written on this topic. In the previous section we have touched on 
some major themes in this area. In this section we consider a few additional features that researchers may find 
useful in practice. In particular, we consider these topics: (1) outliers, leverage, and influence; (2) recursive 
least squares; and (3) Chow’s prediction failure test. Of necessity the discussion of each of these topics 
will be brief. 


Outliers, Leverage, and Influence’? 


Recall that, in minimizing the residual sum of squares (RSS), OLS gives equal weight to every observation 
in the sample. But every observation may not have equal impact on the regression results because of the 
presence of three types of special data points called outliers, leverage, and influence points. It is important 
that we know what they are and how they influence regression analysis. 

In the regression context, an outlier may be defined as an observation with a “large residual.” Recall that 
ü; = (Yi — YA that is, the residual represents the difference (positive or negative) between the actual value 
of the regressand and its value estimated from the regression model. 

When we say that a residual is large, it is in comparison with the other residuals and very often such a large 
residual catches our attention immediately because of its rather large vertical distance from the estimated 
regression line. Note that in a data set there may be more than one outlier. We have already encountered an 
example of this in Exercise 11.22, where you were asked to regress percent change in stock prices (Y) on 
percent change in consumer prices (X) for a sample of 20 countries. One observation, that relating to Chile, 
was an outlier. 

A data point is said to exert (high) leverage if it is disproportionately distant from the bulk of the values 
of a regressor(s). Why does a leverage point matter’? It matters because it is capable of pulling the regression 
line toward itself, thus distorting the slope of the regression line. If this actually happens, then we call such a 
leverage (data) point an influential point. The removal of such a data point from the sample can dramatically 
affect the regression line. Returning to Exercise 11.22, you will see that if you regress Y on X including the 
observation for Chile, the slope coefficient is positive and “highly statistically significant.” But if you drop 
the observation for Chile, the slope coefficient is practically zero. Thus the Chilean observation has leverage 
and is also an influential observation. 


“Wojciech W. Charemza and Derek F. Deadman, New Directions in Econometric Practice: A General to Specific Modelling, 
Cointegration and Vector Autoregression, 2d ed., Edward Elgar Publishers, 1997, p. 30. See also pp. 250-252 for their views 
on various model selection criteria. 


“The following discussion is influenced by Chandan Mukherjee, Howard White, and Marc Wyuts, Econometrics and Data 
Analysis for Developing Countries, Routledge, New York, 1998, pp. 137-148. 
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(a) (b) (c) 
Source: Adapted from John Fox, op. cit., p. 268. 
Figure 13.4 In each subfigure, the solid line gives the OLS line for all the data and the broken line gives the OLS line 
with the outlier, denoted by an &, omitted. In (a), the outlier is near the mean value of X and has low leverage and little influ- 
ence on the regression coefficients. In (b), the outlier is far away from the mean value of X and has high leverage as well as 
substantial influence on the regression coefficients. In (0), the outlier has high leverage but low influence on the regression 
coefficients because it is in line with the rest of the observations. 


To further clarify the nature of outliers, leverage, and influence points, consider the diagram in 
Figure 13.4, which is self-explanatory.” 

How do we handle such data points? Should we just drop them and confine our attention to the remaining 
data points? According to Draper and Smith: 


Automatic rejection of outliers is not always a wise procedure. Sometimes the outlier is providing information that 
other data points cannot due to the fact that it arises from an unusual combination of circumstances which may 
be of vital interest and requires further investigation rather than rejection. As a general rule, outliers should be 
rejected out of hand only if they can be traced to causes such as errors of recording the observations or setting up 
the apparatus [in a physical experiment]. Otherwise, careful investigation is in order.“ 


What are some of the tests that one can use to detect outliers and leverage points? There are several tests 
discussed in the literature, but we will not discuss them here because that will take us far afield.*° Software 
packages such as SHAZAM and MICROFIT have routines to detect outliers, leverage, and influential points. 


Recursive Least Squares 


In Chapter 8 we examined the question of the structural stability of a regression model involving time series 
data and showed how the Chow test can be used for this purpose. Specifically, you may recall that in that 
chapter we discussed a simple savings function (savings as a function of income) for India for the period 
1974-75 to 1995-96. There we saw that the savings income relationship probably changed around 1990. 
Knowing the point of the structural break we were able to confirm it with the Chow test. 


43Adapted from john Fox, Applied Regression Analysis, Linear Models, and Related Methods, Sage Publications, California, 
1997, p. 268. 

“Norman R. Draper and Harry Smith, op. cit., p. 76. 

45Here are some accessible sources: Alvin C. Rencher, Linear Models in Statistics, John Wiley & Sons, New York, 2000, pp. 
219-224; A. C. Atkinson, Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression 
Analysis, Oxford University Press, New York, 1985, Chapter 3; Ashis Sen and Muni Srivastava, Regression Analysis: Theory, 
Methods, and Applications, Springer-Verlag, New York, 1990, Chapter 8; and John Fox, op. cit., Chapter 11. 
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But what happens if we do not know the point of the structural break (or breaks)? This is where one can 
use recursive least squares (RELS). The basic idea behind RELS is very simple and can be explained with 
the savings—income regression. 


Y, = By + BoX; + u; 
where Y = savings and X = income and where the sample is for the period 1974-75 to 1995-96. (See the data 
in Table 8.11.) 

Suppose we first use the data for 1974-75 to 1980-81 and estimate the savings function, obtaining the 
estimates of B, and B,. Then we use the data for 1974-75 to 1980-81 and again estimate the savings function 
and obtain the estimates of the two parameters. Then we use the data for 1974-75 to 1982-83 and re-estimate 
the savings model. In this fashion we go on adding an additional data point on Y and X until we exhaust 
the entire sample. As you can imagine, each regression run will give you a new set of estimates of 6, and 
B2. If you plot the estimated values of these parameters against each iteration, you will see how the values 
of estimated parameters change. If the model under consideration is structurally stable, the changes in the 
estimated values of the two parameters will be small and essentially random. However, if the estimated values 
of the parameters change significantly, it would indicate a structural break. RELS is thus a useful routine with 
time series data since time is ordered chronologically. It is also a useful diagnostic tool in cross-sectional data 
where the data are ordered by some “size” or “scale” variable, such as the employment or asset size of the 
firm. In Exercise 13.30 you are asked to apply RELS to the savings data given in Table 8.11. 

Software packages such as SHAZAM, EViews, and MICROFIT now do recursive least-squares estimates 
routinely. RELS also generates recursive residuals on which several diagnostic tests have been based.*° 


Chow’s Prediction Failure Test 


We have already discussed Chow’s test of structural stability in Chapter 8. Chow has shown that his test 
can be modified to test the predictive power of a regression model. Again, we will revert to India’s savings- 
income regression for the period 1974-75 to 1995-96. 
_ Suppose we estimate the savings—income regression for the period 1974-75 to 1988-89, obtaining 
B\,74~75, 88-89 and B>,74~75, 88-89, which are the estimated intercept and slope coefficients based on the data 
for 1974-75 to 1988-89. Now using the actual values of income for the period 1989-90 to 1995-96, and 
the intercept and slope values for the period 1974-75 to 1988-89, we predict the values of savings for each 
of 1989-90 to 1995-96, years. The logic here is that if there is no serious structural change in the parameter 
values, the values of savings estimated for 1989-90 to 1995-96, based on the parameter estimates for the 
earlier period, should not be very different from the actual values of savings prevailing in the latter period. Of 
course, if there is a vast difference between the actual and predicted values of savings for the latter period, it 
will cast doubts on the stability of the savings—income relation for the entire data period. 

Whether the difference between the actual and estimated savings value is large or small can be tested by 
the F test as follows: í 


p Zir- ER) /m 

(S47) /(m — k) 
where n, = number of observations in the first period (1974-75 to 1988-89) on which the initial regression 
is based, n, = number of observations in the second or forecast period, > a? = RSS when the equation is 


(13.10.1) 


“°For details, see Jack Johnston and John DiNardo, Econometric Methods, 4th ed., McGraw-Hill, New York, 1997 
pp. 117-121. ` i i 
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estimated for all the observations (n, + n»), and J- a? = RSS when the equation is estimated for the first n, 
observations, and k is the number of parameters estimated (two in the present instance). If the errors are 
independent, and identically, normally distributed, the F statistic given in Eq. (13.10.1) follows the F distri- 
bution with n, and n; df, respectively. In Exercise 13.31 you are asked to apply Chow’s predictive failure test 
to find out if the savings—income relation has in fact changed. In passing, note the similarity between this test 
and the forecast y test discussed previously. 


Missing Data 


In applied work it is not uncommon to find that sometimes observations are missing from the sample data. 
For example, in time series data there may be gaps in the data because of special circumstances. During 
the Second World War, data on some macro variables were not available or were not published for strategic 
reasons. In cross-section data it is not uncommon to find that information on some variables for some 
individuals is missing, especially in data collected from questionnaire-type surveys. In panel data also, over 
time some respondents drop out or do not provide information on all the questions. 

Whatever the reason, missing data is a problem that every researcher faces from time to time. The question 
is how we deal with the missing data. Is there any way to impute values to the missing observations? 

This is not an easy question to answer. Although there are some complicated solutions suggested in the 
literature, we will not pursue them here because of their complexity.*” However, we will discuss two cases.** 
In the first case, the reasons for the missing data are independent of the available observations, which are 
called by Darnell the “ignorable case.” In the second case, not only are the available data incomplete, but the 
missing observations may be systematically related to the available data. This is a more serious case, for it 
may be the result of self-selection bias, that is, the observed data are not truly randomly collected. 

In the ignorable case, we may simply ignore the missing observations and use the available observations. 
Most statistical packages do this automatically. Of course, in this case the sample size is reduced and we may 
not be able to get precise estimates of the regression coefficients. We might use the available data to shed 
some light on the missing observations, however. Here we consider three possibilities. 


1. Out of a total number of observations of N, we have complete data on N; (N; < N) for both the regressand 
and k regressors denoted by Y, and X}, respectively. (Y, is vector of N, observations and X} is a row 
vector of k regressors). 

2. For some observations (N, < N) there are complete data on the regressand, denoted by Y,, but incom- 
plete observations on some X, (again these are vectors). 

3. For some observations (N; < N), there are no data on Y, but complete data on X, denoted by X3. 


In the first case, regression of Y} on X, will produce estimates of the regression coefficients that are 
unbiased but they may not be efficient because we ignore N, and N, observations. The other two cases are 
rather complicated and we leave it for the reader to follow the references for solutions.” 


47For a thorough, but rather advanced, treatment of the subject, see A. Colin Cameron and Pravin K. Trivedi, Microecono- 
metrics: Methods and Applications, Cambridge University Press, New York, 2005, Chapter 27, pp. 923-941. 

48The following discussion is based on Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar Publishing, Lyne, U.K., 
1994, pp. 256-258. 

Besides the references already cited, see A. A. Afifi, and R. M. Elashoff, “Missing Observations in Multivariate Statistics,” 
Journal of the American Statistical Association, vol. 61, 1966, pp. 595-604, and vol. 62, 1967, pp. 10-29. 
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13.11 Concluding Examples 


We conclude this chapter with two examples that illustrate one or more points raised in the chapter. The first 
example on wage determination uses cross-section data and the second example, which considers the real 
consumption function for the U.S., uses time series data. 


|. A Model of Hourly Wage Determination 


To examine what factors determine hourly wages, we consider a Mincer-type wage model, which has become 


popular with labor economists. This model has the following form:°° 


In wage; = 81 + b2Edu; + B3Exp; + B4Fe; + BsNW; + BeUN; + B7WK; + ui (1 3.11.1) 


Where In wage = natural log of hourly wage ($), Edu = education in years, Exp = labor market experience, 
Fe = 1 if female, 0 otherwise, NW = 1 if non-white, 0 otherwise, UN = 1 if in union, 0 otherwise, and 
WK = 1 for non-hourly paid workers, 0 otherwise. For the non-hourly paid workers, the hourly wage is 
computed as weekly earnings divided by the usual hours worked. 

There are many more variables that could be added to this model. Some of these variables are ethnic 
origin, marital status, number of children under age 6, and wealth or non- labor income. For now, we will 
work with the model shown in Eq. (13.11.1). 

The data consist of 1,289 persons interviewed in March 1985 as a part of the Current Population Survey 
(CPS) periodically conducted by the U.S. Census Bureau. These data were originally collected by Paul 
Rudd.”! 

A priori, we would expect education and experience to have a positive impact on wages. The dummy 
variables Fe and NW are expected to have a negative impact on wages if there-is some kind of discrimination 
and UN is expected to have a positive impact because of uncertainty of income. 

When all the dummy variables take a value of zero, Eq. (13.11.1) reduces to 


In wage; = 8; + Bo2Edu; + B3Exp; + ui (131322) 


which is the wage function for a non-unionized white male worker who is on an hourly wage rate. This is the 
base, or reference, category. 

Let us now present the regression results and then discuss them. 

The first thing to notice is that all the estimated coefficients are individually highly significant, for the 
p-values are so low. The F is also very high, suggesting that collectively, also, all the variables are statistically 
important. 

Compared to the reference worker, the average wage of a female worker and a non-white worker is lower. 
Union workers and those who are paid weekly, on average, make more wages. 

How adequate is model (13.11.1), given the variables we have considered? Is it possible that non-white 
female workers earn less than white workers? Is it possible that nonwhite female non-union workers earn less 
than white female non-union workers? In other words, are there any interaction effects between the quanti- 
tative regressors and the dummy variables? 

Statistical packages have routines to answer such questions. For instance, EViews has such a facility. After 
a mode] is estimated, if you think that some variables can be added to the model but you are not sure of their 
importance, you can run the test of omitted variables. 


5°See J. Mincer, School, Experience and Earnings, Columbia University Press, New York, 1974. 


5'Paul A. Rudd, An Introduction to Classical Econometric Theory, Oxford University Press, New York, 2000. We have not 
included data on age because it is highly collinear with job experience. 
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Table 13.4 EViews Regression Results Based on Equation (13.11.1) 


Dependent Variable: LW 
Method: Least Squares 
Sample: 1-1,289 

Included observations: 1,289 


Coefficient Std. Error t Statistic PEOD: 


(C 1.037880 0.074370 13995563 0.0000 
EDU 0.084037 0.005110 16.44509 0.0000 
EXP ao E 0.001163 9E5995 0.0000 
FE -0.234934 0.026071 = le) (0) TAL) 0.0000 
NW -0.124447 0.036340 -3.424498 0.0006 
UN 0.207508 0036265 5.721963 0.0000 
WK 07228725 0.028939 7.903647 0.0000 
R-squared 0.376053 Mean dependent var. ZEA ZAG 
Adjusted R-squared Op SSS S.D. dependent var. 0586356 
S.E. of regression 0.464247 Akaike info criterion 1.308614 
Sum squared resid. 276731030 Schwarz criterion 1.336645 
Log likelihood -836.4018 Hannan-Quinn criter. ee lea 
F-statistic TASET Durbin-Watson stat. 1.977004 


Prob. (F-statistic) 0.000000 


To show this, suppose we estimate Eq. (13.11.1) and now want to find out if the products of Fe and NW, 
FE and UN, and FE and WK should be added to the model to take into account the interaction between the 
explanatory variables. Using the EViews 6 routine, we obtain the following answer: The null hypothesis is that 
these three added variables have no effect on the estimated model. 

As you would suspect, we can use the F test (discussed in Chapter 8) to assess the incremental, or marginal, 
contribution of the added variables and test the null hypothesis. For our example, the results are as follows: 


Table 13.5 Partial E Views Results Using Interactions 


Omitted Variables: FE*NW FE*UN FE*WK 


F-statistic 0.805344 Prob. F (3,1279) 0.4909 
Log likelihood ratio 2.432625 Prob. chi-square (3) 0.4876 


We do not reject the null hypothesis that the interaction between female and non-white, female and union, 
and female and weekly wage earners, collectively, has no significant impact on the estimated model given 
in Table 13.4, for the estimated F value of 0.8053 is not statistically significant, the p value being about 49 
percent. 

We leave it for the reader to try other combinations of the regressors to assess their contribution to the 
original model. 

Before proceeding further, the model (13.11.1) suggests that the influence of experience on log wages 
is linear, that is, holding other variables constant, the relative increase in wages (remember the regressand 
is in log form), remains the same for every year’s increase in job experience. This assumption may be true 
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over some years of experience, but as basic labor economics suggests, as workers get older, the rate of wage 
increase decreases. To see if this is the case in our example, we added the squared experience term to our 
initial model and obtained the following results: 


The squared experience termis not only negative butitis also highly statistically significant. Italso accords with 


owe a 0.0012EXP) l 
XP 


labor market behavior; over time, the rate of growth of wages slows down 


We take this opportunity to discuss the Akaike and Schwarz criteria. Like R?, these are tests of the goodness 
of fit of the estimated model; the difference is that under the R? criterion, the higher its value, the better the 
model explains the behavior of the regressand. On the other hand, under the Akaike and Schwarz criteria, the 
lower the value of these statistics, the better is the model. 

Of course, all these criteria are meaningful if we want to compare two or more models. Thus, if you 
compare the model in Table 13.4 with the model in Table 13.6, which has the experience-squared as an 
additional regressor, we see that the model in Table 13.6 is preferable to the one in Table 13.4 on the basis of 
the three criteria. 


Table 13.6 EVews Results with Experience Squared 


Dependent Variable: LW 
Method: Least Squares 
Sample: diel , 21/8) 

Included observations: 1,289 


Coefficient StAL liciaowe t Statistic Prob. 
Cc 0.912279 OR OS il Seal 12922 _ 0.0000 
EDU 0.079867 0.005051 15231218 0.0000 
EXP 0.036659 0.003800 9.647230 0.0000 
FE -0.228848 0.025606 -8.937218 0.0000 
NW -0.121805 0.035673 -3.414458 0.0007 
UN 0.199957 0.035614 Sone 0.0000 
WK 0.222549 0.028420 7.830675 0.0000 
EXP*EXP -0.000611 8.68E-05 -7.037304 0.0000 
R-squared Ons oo Mean dependent var. Pas Ssh AU ey 2 
Adjusted R-squared 01395995 S.D. dependent var. 0.586356 
S.E. of regression 0.455703 Akaike info criterion 1.272234 
Sum squared resid. 266.0186 Schwarz criterion 1.304269 
Log likelihood -811.9549 Hannan-Quinn criter. 1.284259 
F-statistic MAI GE Durbin-Watson stat. E. Siege 3 


Prob. (F-statistic) 0.000000 


Incidentally, note that in both models the R? values seem “low,” but such low values are typically observed 
in cross-section data with a large number of observations. However, note that this “low” R? value is statisti- 
cally significant, since in both models the computed F statistic is highly significant (recall the relationship 
between F and R? discussed in Chapter 8). 

Let us continue with the expanded model given in Table 13.6. Although the model looks satisfactory, let 
us explore a couple of points. First, since we are dealing with cross-section data, there is every chance that 
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the model suffers from heteroscedasticity. So, we need to find out if this is the case. We applied several of 
the tests of heteroscedasticity discussed in Chapter 11 and found that the model does in fact suffer from 
heteroscedasticity. The reader should verify this assertion. 

To correct for the observed heteroscedasticity, we can obtain White's heteroscedasticity consistent standard 
errors, which were discussed in Chapter 11. The results are given in the following table. 

As you would expect, there are some changes in the estimated standard errors, although this does not 
change the conclusion that all the regressors are important, both individually as well as collectively, in 
explaining the behavior of relative wages. 

Let us now examine if the error terms are normally distributed. The histogram of the residuals obtained 
from the model in Table 13.7 is shown in Figure 13.5. The Jarque—Bera statistic rejects the hypothesis that 
the errors are normally distributed, for the JB statistic is high and the p value is practically zero: Note that for 
a normally distributed variable, the skewness and kurtosis coefficients are, respectively, 0 and 3. 


Table 13.7 EViews Results Using White’s Corrected STD Errors 


Dependent Variable: LW 

Method: Least Squares 

Sample: 1-1,289 

Included observations: 1,289 

White’s Heteroscedasticity-Consistent Standard Errors 
and Covariance 


Coefficient SCO idisievore C Sea a Sieaye: Prob. 

e OS SAS) OROM 52A AET 0.0000 
EDU 0.079867 0.005640 14.15988 0.0000 
EXP 0.036659 0.003789 9.675724 0.0000 
FE -0.228848 0.025764 -8.882625 0.0000 
NW -0.121805 0.033698 -3.614573 0.0003 
UN 0.199957 0.029985 6.668458 0.0000 
WK 0.222549 10). Osha Si(O)IL Ws UO AL 0.0000 
EXP*EXP -0.000611 9.44E-05 -6.470218 0.0000 
R-squared omo Mean dependent var. 2.342416 
Adjusted R-squared 0.395995 S.D. dependent var.. 0.586356 
S.E. of regression 0.455703 Akaike info criterion 1.272234 
Sum squared resid. 266.0186 Schwarz criterion iL 5 SHOVES) 
Log likelihood -811.9549 Hannan-Quinn criter. 1.284259 
F-statistic 1206331 Durbin-Watson stat. ie oTa 


Prob. (F-statistic) 0.000000 


Now what? Our hypothesis testing procedure thus far has rested on the assumption that the disturbance, 
or error, term in the regression model is normally distributed. Does this mean that we cannot legitimately use 
the t and F tests to test hypotheses in our wage regression? 

The answer is no. As noted in the chapter, the OLS estimators are asymptotically normally distributed 
with the caveat noted in the chapter, namely that the error term has finite variance, is homoscedastic, and 
the mean value of the error term, given the values of the explanatory variables, is zero. As a result, we can 
continue to use the usual ż and F tests, provided the sample is reasonably large. In passing it may be noted that 
we did not need the normality assumption to obtain OLS estimators. Even without the normality assumption 
the OLS estimators are best linear unbiased estimators (BLUE) under the Gauss—Markov assumptions. 
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Series: RESID 
Sample: 1-1,289 
Observations: 1,289 


280 


240 Mean -9,38e-09 


Median —0.850280 
Maximum 48.92719 
Minimum -20.58590 
Std. Dev. 6.324574 
Skewness 1.721323 
Kurtosis 10.72500 


200 


160 


Jarque-Bera 3841.617 
Probability 0.000000 


=12.5 0.0 12.5 25.0 iS 50.0 
Figure 13.5 A histogram of the residuals obtained from the regression in Table 13.7 


How large is a large sample? There is no definitive answer to this question, but the sample size of 1,289 
observations in our wage regression seems reasonably large. 

Are there any “outliers” in our wage regression? Some idea about this can be gleaned from the graph 
in Figure 13.6, which gives the actual and estimated values of the dependent variable (In wage) and the 
residuals, which are the differences between the actual and estimated values of the regressand. 

Although the mean value of the residuals is always zero (why?), the graph in Figure 13.6 shows that there 
are several residuals that seem large (in absolute value) compared with the bulk of the residuals. It is possible 
that there are outliers in the data. We provide the raw statistics on the three quantitative variables in Table 13.8 
to aid the reader in deciding whether there are indeed outliers. 


50 o 
40 5 if 
30 ar 


RESID 


0 250 500 750 1,000 1,250 
Estimated In wage 
Figure 13.6 Residuals vs estimated values of the dependent variable, In wage 


Table 13.8 
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Sample: 1-1,289 


W EDU EXP 

Mean iL Aes sila) sho) thes ell (VS Ohi) 18.78976 
Median 10.08000 12.00000 18.00000 
Maximum . 64.08000 20.00000 56.00000 
Minimum 0.840000 0.000000 0.000000 
Std. Dev. 7-896350 2 818823 UGE 84 
Skewness 1.848114 —Or29038'31 ORS S669 
Kurtosis 786565 597/7464 2.327946 
Jarque-Bera L990%134 494.2552 54.57664 
Probability 0.000000 0.000000 0.000000 
Sum 115939558 16944.00 24220.00 
Sum Sq. Dev. 80309.82 ALOIS) 7). YY 1751 960 
Observations 17289 


1289 17289 


2. Real Consumption Function for the United States, 1947-2000 
In Chapter 10 we considered the consumption function for the U.S. for the years 1947-2000. The specific 
form of the consumption function we considered was: - 


In TC, = A; + b In YD; + Bs In W + Balnterest, + up (13.11.3) 


Where TC, YD, W, and Interest are, respectively, total consumption expenditure, personal disposable income, 
wealth, and interest rate, all in real terms. The results based on our data are as follows: 


Table 13.9 Results of Regression Equation (13.11.3) 


Method: Least Squares 
Sample: 1947-2000 
Included observations: 54 


Coefficient std. Error eC tabictic BEODE 
C -0.467711 0.042778 -10.93343 0.0000 
LOG (YD) 0.804873 0.017498 45.99836 0.0000 
LOG (WEALTH) 0201270 0017593 11.44060 0.0000 
INTEREST -0.002689 0.000762 -3.529265 0.0009 
R-squared 0.999560 Mean dependent var. 7.826093 
Adjusted R-squared 0.999533 S.D. dependent var. 02552368 
S.E. of regression 0.011934 Akaike info criterion -5.947703 
Sum squared resid. VeUCia2l Schwanz critemion lowest OU I 
Log likelihood 164.5880 Hannan-Quinn criter. -5.890883 
F-statistc 37832.59 Durbin-Watson stat. | ,1.289219 
Prob. (F-statistic) 0.000000 
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Since TC, YD, and Wealth enter in logarithmic form, the estimated slope coefficients of YD and Wealth 
are, respectively, income and wealth elasticities. As you would expect, these elasticities are positive and are 
highly statistically significant. Numerically, the income and wealth elasticities are about 0.80 and 0.20. The 
coefficient of the interest rate variable represents semielasticity (why?). Holding other variables constant, the 
results show that if the interest rate goes up by 1 percentage point, on average, real consumption expenditure 
goes down by about 0.27 percent. Note that the estimated semielasticity is also highly statistically significant. 

Look at some of the summary statistics. The R? value is very high, almost reaching 100 percent. The F 
value is also highly statistically significant, suggesting that, not only individually, but also collectively, all the 
explanatory variables have a significant impact on consumption expenditure. 

The Durbin—Watson statistic, however, suggests that errors in the model are serially correlated. If we 
consult the Durbin—Watson tables (Table D.5 in Appendix D), we see that for 55 observations (the closest 
number to 54) and three explanatory variables, the lower and upper 5 percent critical d values are 1.452 and 
1.681. Since the observed d in our example, 1.2892, is below the lower critical d values, we may conclude 
that the errors in our consumption function are positively correlated. This should not be a surprising finding, 
for most time series regressions suffer from autocorrelation. 

But before we accept this conclusion, let us find out if there are any specification errors. As we know, 
sometimes autocorrelation may be apparent because we have omitted some important variables. To see if this 
is the case, we consider the regression obtained in Table 13.10. 


Table 13.10 


Dependent Variable: LTC 
Method: Least Squares 
Sample: 1947-2000 
Included observations: 54 


Coefficient StA Error t Statistic ~ Prob. 

ie 2.689644 0.566034 e TASTET) 0.0000 
LYD 07512836 0.054056 9.487076 0.0000 
LW -0.205281 0.074068 -2.771510 0.0079 
INTEREST -0.001162 0.000661 -1.759143 0.0848 
LYD*LW 0.039901 0.007141 5.587986 0.0000 
R-squared 0.999731 Mean dependent var. 7.826093 
Adjusted R-squared 0.999709 S.D. dependent var. 0.552368 
S.E. of regression 0.009421 Akaike info criterion -6.403689 
Sum squared resid. 0.004349 Schwarz criterion -6.219524 
Log likelihood 177.8996 Hannan-Quinn criter. -6.332663 
F-statistic 45534.94 Durbin-Watson Stat. 1.530268 


Prob. (F-statistic) 0.000000 


The additional variable in this model is the interaction of the logs of disposable income and wealth. 
This interaction term is highly significant. Notice that now the interest variable has become less significant 
(p value of about 8 percent), although it retains its negative sign. But now the Durbin—Watson d value has 
increased from about 1.28 to about 1.53. 

The 5 percent critical d values now are 1.378 and 1.721. The observed d value of 1.53 lies between these 
values, suggesting that, on the basis of the Durbin—Watson statistic, we cannot determine whether or not 
we have autocorrelation. However, the observed d value is closer to the upper limit d value. As noted in the 
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chapter on autocorrelation, some authors suggest using the upper limit of the d statistic as approximately 
the true significance limit; therefore, if the computed d value is below the upper limit, there is evidence of 
positive autocorrelation. By that criterion, in the present instance we can conclude that our model suffers 
from positive autocorrelation. 

We also applied the Breusch-Godfrey test of autocorrelation that we discussed in Chapter 12. Adding the 
two lagged terms of the estimated residuals in Equation (12.6.15) to the model in Table 13.9, we obtained the 
following results: 


Table 13.11 


Breusch-Godfrey Serial Correlation LM Test: 


F-statistic Se Se seal Prob. F(2,48) 0.0473 


Obs* R-squared 6.447576 Prob. chi-square (2) 0.0398 

Dependent Variable: RESID 

Method: Least Squares 

Sample: 1947-2000 

Included observations: 54 

Presample missing value lagged residuals set to zero. 

Coefficient Sietely as locineyg t Statistic Prob. 

C -0.006514 OF 041528 20.156851 0.8760 
LYD -0.004197 oana 7g -0.244619 0.8078 
LW 0.004191 (CO) 7/2) al 0.242674 0.8093 

INTEREST 0.000116 0.000736 0.156964 0.3759 

RESID(-1) 0.385190 0.151581 2.541147 0n 0i 

RESID (-2) -0.165609 0.154695 -1.070556 0.2897 

R-squared 0.119400 Mean dependent var. -9.02E-17 

Adjusted R-squared 0.027670 S.D. dependent var. 070159] 

S.E. of regression 0.011430 Akaike info criterion -6.000781 

Sum squared resid. 0.006271 Schwarz criterion -5.779782 

Log likelihood 168.0211 Hannan-Quinn criter. -5.915550 

F-statistic 1.301653 Durbin-Watson Stat. 1.848014 


Prob. (F-statistic) 


0.279040 


The F reported at the top tests the hypothesis that the two lagged residuals included in the model have zero 
values. This hypothesis is rejected because the F is significant at about the 5 percent level. 

To sum up, it seems that there is autocorrelation in the error term. We can apply one or more procedures 
discussed in Chapter 12 to remove autocorrelation. But to save space, we leave that task to the reader. 

In Table 13.12 we report the results of regression analysis that present the HAC or Newey—West standard 
errors that take into account the autocorrelation. Our sample size of 54 observations is large enough to use 
the HAC standard errors. 

If you compare these results with those given in Table 13.9, you will observe that the regression coeffi- 
cients remain the same, but that the standard errors are somewhat different. 

In this chapter we discussed Chow’s prediction failure test. We have a sample period that extends from 
1947 to 2000. Over this period, we have had several business cycles, mostly of short durations. For example, 
there was a recession in 1990 and another one in 2000. Is the behavior of consumer expenditure in relation to 
income, wealth, and the interest rate different during recessions? 
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Table 13.12 


Dependent Variable: LTC 
Method: Least Squares 
Sample: 1947-2000 
Included observations: 54 
Newey-West HAC Standard Errors and Covariance (lag 


inane tales = 3) 

Coefficient Std. Error t Statistic Prob. 
(& -0.467714 0.043937 -10.64516 0.0000 
LYD 0.804871 ocon A OAS 0.0000 
LW 0.201272 0.015447 13 .02988 0.0000 
INTEREST -0.002689 0.000880 -3.056306 0.0036 
R-squared 0.999560 Mean dependent var.: 7.826093 
Adjusted R-squared 0.999533 S.D. dependent var. 0.552368 
S.E. of regression 0.011934 Akaike info criterion -5.947707 
Sum squared resid. 0.007121 Schwarz criterion -5.800374 
Log likelihood 164.5881 Hannan-Quinn criter. -5.890886 
P-statistic 37832.71 Durbin-Watson Stat. 1.289237 

Prob. (F-statistic) 0.000000 


To shed light on this question, let us consider the 1990 recession and apply Chow’s prediction failure test. 
The details of this test have already been discussed in the chapter. Using Chow’s predictive failure test in 
EViews, version 6, we obtain the results given in Table 13.13. 


Table 13.13 Chow’s Test of Predictive Failure 


Chow’s Forecast Test: Forecast from 1991 to 2000 


F-statistic 


1.957745 Prob. F (10,40) 0.0652 
Log likelihood ratio 21.51348 Prob. chi-square (10) 0- 0178 
Dépendent Variable: LTC 
‘Method: Least Squares ha 
Sample: 1947-1990 
Included observations: 44 
a ons Coefficient Star PETOK t Statistic Prob. 
€ -0.287952 0.095089 23.028236 0.0043 
LYD oea? 0.028473 29.96474 0.0000 
LW (0) LUE 2) 0m0: 08 5a. 4.277239 0.0001 
INTEREST -0.002060 0.000804 -2.562790 0.0143 
R-squared 0.999496 Mean dependent var. 7.659729 
Adjusted R-squared 0.999458 S.D. dependent var. 0.469580 
S.E. of regression 0.010933 Akaike info criterion -6.107640 
Sum squared resid. 0.004781 Schwarz criterion -5.945441 
Log likelihood 138.3681 Hannan-Quinn criter. -6.047489 
P estatistic 26430.49 Durbin-Watson Stat. 1.262748 


Prob. (F-statistic) 0.000000 
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The F statistic given in the top portion of Table 13.13 suggests that there probably is not a substantial 
difference in the consumption function pre- and post-1990, for its p value is not significant at the 5 percent 
level. But if you choose the 10 percent level of significance, the F value is statistically significant. 

We can look at this problem differently. In Chapter 8 we discussed a test of parameter stability. To see if 
there has been any statistically significant change in the consumption function regression coefficients, we 
used the Chow test discussed in Section 8.7 of Chapter 8 and obtained the results given in Table 13.14. 


Table 13.14 Chow’s Test of Parameter Stability 


Chow Breakpoint Test: 1990 
Null Hypothesis: No breaks at specified breakpoints 
Varying regressors: All equation variables 

Equation Sample: 1947-2000 


Bostatwst1c 4.254054 Prob. F(4,46) 0.0052 


Log likelihood ratio 16799654 Prob. chi-square (4) OR OOS 


Wald statistic I7 Osta Prob. chi-square (4) 0.0019 


Apparently. it seems that the consumption function pre- and post-1990 are statistically different, for the 
computed F statistic, following Eq. (8.7.4), is highly statistically significant because the p value is only 
0.0052. 

The reader is encouraged to apply Chow’s parameter stability and predictive failure tests to determine 
if the consumption function pre- and post-2000 has changed. To do this, you will have to extend the data 
beyond 2000. Also note that to apply these tests the number of observations must be greater than the number 
of coefficients estimated. 

We have exhausted all of the diagnostic tests that we can apply to our consumption data. But the analysis 
provided thus far should give you a fairly good idea about how one can apply the various tests. 


13.12 Non-Normal Errors and Stochastic Regressors 


In this section we discuss two topics that are of a somewhat advanced nature, namely, non-normal distribution 
of the error term, and stochastic, or random, regressors and their practical importance. 


|. What Happens if the Error Term is Not Normally Distributed? 


In the classical normal linear regression model (CNLRM) discussed in Chapter 4, we assumed that the error 
term u follows the normal distribution. We invoked the central limit theorem (CLT) to justify the normality 
assumption. Because of this assumption, we were able to establish that the OLS estimators are also normally 
distributed. As a result, we were able to do hypothesis testing using the ż and F tests regardless of the sample 
size. We also discussed using the Jarque-Bera and Anderson—Darling normality tests to find out if the 
estimated errors are normally distributed in any practical application. 

What happens if the errors are not normally distributed? It can be stated that the OLS estimators are still 
BLUE, that is, they are unbiased and in the class of linear estimators they show minimum variance. Intui- 
tively, this should not be surprising, for to establish the Gauss—-Markov (BLUE) theorem we did not need the 
normality assumption. É . 
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Then what is the problem? 

The problem is that we need the sampling, or probability, distributions of the OLS estimators. Without 
that we cannot engage in any kind of hypothesis testing regarding the true values of these estimators. As 
shown in Chapters 3 and 7, the OLS estimators are linear functions of the dependent variable Y, and Y itself 
is a linear function of the stochastic error term u, assuming that the explanatory variables are non-stochastic, 
or fixed in repeated sampling. Ultimately, then, we need the probability distribution of u. 

As noted above, the classical normal linear regression model (CNLRM) assumes that the error term 
follows the normal distribution (with zero mean and constant variance). Using the central limit theorem 
(CLT) to justify the normality of the error term, we were able to show that the OLS estimators themselves are 
normally distributed with means and variance discussed in Chapters 4 and 7. This in turn allowed us to use 
the ż and F statistics in hypothesis testing in small, or finite, samples as well as in large samples. Therefore, 
the role of the normality assumption is very critical, especially in small samples. 

But what if we cannot maintain the normality assumption on the basis of various normality tests? What 
then? We have two choices. The first is bootstrapping and the second is to invoke large, or asymptotic, 
sample theory. 

A discussion of bootstrapping, which is gradually seeping into applied econometrics. will take us far a 
field. The basic idea underlying bootstrapping is to churn (or regurgitate) a given sample over and over again 
and then obtain the sampling distributions of the parameters of interest (OLS estimators for our purpose). 
How this is done in practice is best left for references.°” By the way, the term bootstrapping comes from the 
commonly used expression, “to pull oneself up by one’s own bootstrap.” 

The other approach to deal with non-normal error terms is to use asymptotic, or large sample theory. As 
a matter of fact, a glimpse of this was given in Appendix 3A.7 in Chapter 3, where we showed that the OLS 
estimators are consistent. As discussed in Appendix A, an estimator is consistent if it approaches the true 
value of the estimator as the sample size gets larger and larger (see Figure A.11 in Appendix A). 

But how does that help us in hypothesis testing? Can we still use the t and F tests? It can be shown that 
under the Gauss—Markov assumptions the OLS estimators are asymptotically normally distributed with 
the means and variances discussed in Chapters 4 and 7.° As a result, the t and F tests developed under the 
normality assumption are approximately valid in large samples. The approximation becomes quite good as 
the sample size increases. 


2. Stochastic Explanatory Variables 
In Chapter 3 we introduced the classical linear (in parameter) regression model under some simplifying 
assumptions. One of the assumptions was that the explanatory variables, or regressors, were either fixed or 
non-stochastic, or if stochastic, they were independent of the error term. We called the former case the fixed 
regressor case and the latter the random regressor case. 


52For an informal discussion, see Christopher Z. Mooney and Robert D. Duval, Bootstrapping: A Nonparametric Approach to 
Statistical Inference, Sage University Press, California, 1993. For a more formal textbook discussion, see Russell Davidson and 
James G. MacKinnon, Econometric Theory and Methods, Oxford University Press, New York, 2004, pp. 159-166. 
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Recall the Gauss-Markov assumptions, namely, the expected value of the error term is zero, the error term and each 
of the explanatory variables are independent, the error variance is homoscedastic, and there is no autocorrelation in the 
error term. It is also assumed that the variance-covariance matrix of the explanatory variables is finite. We can also relax 


the condition of independence between the error term and the regressors and assume the weaker condition that they are 
uncorrelated. 


The proof of asymptotic normality of OLS estimators is beyond the scope of this book. See James H. Stock and Mark W. 
Watson, Introduction to Econometrics, 2d ed., Pearson/Addison Wesley, Boston, 2007, pp. 710-711. 
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In the fixed regressor case, we already know the properties of the OLS estimators (see Chapters 5 and 8). 
In the random regressor case, if we proceed with the assumption that our analysis is conditional on the given 
values of the regressors, the properties of OLS estimators that we have studied under the fixed regressor case 
continue to hold true. 

If in the random regressor case we assume that these regressors and the error term are independently 
distributed, the OLS estimators are still unbiased but they are no longer efficient.” 

Things get complicated if the error term is not normally distributed, or regressors are stochastic, or both. 
Here it is difficult to make any general statements regarding the finite-sample properties of the OLS estimators. 
However, under certain conditions, we can invoke the central limit theorem to establish the asymptotic 
normality of OLS estimators. Although beyond the scope of this book, the proofs can be found elsewhere.”° 


13.13 A Word to the Practitioner 


We have covered a lot of ground in this chapter. There is no question that model building is an art as well as 
a science. A practical researcher may be bewildered by theoretical niceties and an array of diagnostic tools. 
But it is well to keep in mind Martin Feldstein’s caution that “The applied econometrician, like the theorist, 
soon discovers from experience that a useful model is not one that is ‘true’ or ‘realistic’ but one that is parsi- 
monious, plausible and informative.”>’ 

Peter Kennedy of Simon Fraser University in Canada advocates the following “Ten Commandments of 


Applied Econometrics” :** 


— 


Thou shalt use common sense and economic theory. 

Thou shalt ask the right questions (i.e., put relevance before mathematical elegance). 

Thou shalt know the context (do not perform ignorant statistical analysis). 

Thou shalt inspect the data. 

Thou shalt not worship complexity. Use the KISS principle, that is, keep it stochastically simple. 
Thou shalt look long and hard at thy results. 

Thou shalt beware the costs of data mining. 

Thou shalt be willing to compromise (do not worship textbook prescriptions). 

Thou shalt not confuse significance with substance (do not confuse statistical significance with practical 
significance). 

10. Thou shalt confess in the presence of sensitivity (that is, anticipate criticism). 


2 ee Nee eh 


You may want to read Kennedy’s paper fully to appreciate the conviction with which he advocates the 
above ten commandments. Some of these commandments may sound tongue-in-cheek, but there is a grain 
of truth in each. 


55For technical details, see William H. Greene, Econometric Analysis, 6th ed., Pearson/Prentice-Hall, New Jersey, 2008, 
pp. 49-50. 

5®See Greene, op. cit. 

57Martin S. Feldstein, “Inflation, Tax Rules and Investment: Some Econometric Evidence,” Econometrica, vol. 30, 1982, 
p. 829. 

58Peter Kennedy, op. cit., pp. 17-18. 
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Summary and Conclusions 


10. 


11. 


12. 


The assumption of the CLRM that the econometric model used in analysis is correctly specified has two 
meanings. One, there are no equation specification errors, and two, there are no model specification 
errors. In this chapter the major focus was on equation specification errors. 

The equation specification errors discussed in this chapter were (1) omission of an important variable(s), 
(2) inclusion of a superfluous variable(s), (3) adoption of the wrong function form, (4) incorrect speci- 
fication of the error term u,, and (5) errors of measurement in the regressand and regressors. 

When legitimate variables are omitted from a model, the consequences can be very serious: The 
OLS estimators of the variables retained in the model are not only biased but inconsistent as well. 
Additionally, the variances and standard errors of these coefficients are incorrectly estimated, thereby 
vitiating the usual hypothesis-testing procedures. 

The consequences of including irrelevant variables in the model are fortunately less serious: The 
estimators of the coefficients of the relevant as well as “irrelevant” variables remain unbiased as well 
as consistent, and the error variance g? remains correctly estimated. The only problem is that the 
estimated variances tend to be larger than necessary, thereby making for less precise estimation of the 
parameters. That is, the confidence intervals tend to be larger than necessary. 

To detect equation specification errors, we considered several tests, such as (1) examination of residuals, 
(2) the Durbin—Watson d statistic, (3) Ramsey’s RESET test, and (4) the Lagrange multiplier test. 

A special kind of specification error is errors of measurement in the values of the regressand and 
regressors. If there are errors of measurement in the regressand only, the OLS estimators are unbiased 
as well as consistent but they are less efficient. If there are errors of measurement in the regressors, the 
OLS estimators are biased as well as inconsistent. 

Even if errors of measurement are detected or suspected, the remedies are often not easy. The use 
of instrumental or proxy variables is theoretically attractive but not always practical. Thus it is very 
important in practice that the researcher be careful in stating the sources of his/her data, how they were 
collected, what definitions were used, etc. Data collected by official agencies often come with several 
footnotes and the researcher should bring those to the attention of the reader. 

Model mis-specification errors can be as serious as equation specification errors. In particular, we 
distinguished between nested and non-nested models. To decide on the appropriate model we discussed 
the non-nested, or encompassing, F test and the Davidson—MacKinnon J test and pointed out the limita- 
tions of each test. 

In choosing an empirical model in practice researchers have used a variety of criteria. We discussed 
some of thase such as the Akaike and Schwarz information criteria, Mallows`s C, criterion, and 
forecast x? criterion. We discussed the advantages and disadvantages of these criteria and also warned 
the reader that these criteria are not absolute but are adjunct to a careful specification analysis. 

We also discussed these additional topics: (1) outliers, leverage, and influence; (2) recursive least 
squares; and (3) Chow’s prediction failure test. We discussed the role of each in applied work. 

We discussed briefly two special cases, namely, non-normality of the stochastic error term and random 
regressors and the role of asymptotic, or large, sample theory in situations where small, or finite, sample 
properties of OLS estimators canot be established. 

We concluded this chapter by discussing Peter Kennedy’s “ten commandments of applied econo- 
metrics.” The point of these commandments is to ask the researcher to look beyond the purely technical 
aspects of econometrics. 
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Multiple Choice Questions 


. Which of these according to R. Hendry and Richard is not a criterion for choosing a model for empirical 
analysis? 
a. Be consistent with theory 
b. Have regressors - 
c. Exhibit parameter constancy 
d. Exhibit data coherency 
. Which of the following is NOT a cause for model specification errors’? 
a. Omitting a relevant variable 
b. Including irrelevant variable 
c. Errors of measurement bias 
d. Correct functional form 
. When specification error may be caused due to not knowing the true regression model to begin with, 
such an error is known as 
a. Model mis-specification error 
b. Model specification error 
c. Wrong functional form 
d. Error of measurement bias 


. The coefficient of underfitted model is 


a. Biased coefficient 
b. Inconsistent coefficient 
c. Inefficient coefficient 
d. All of the above 
. Coefficient of overfitted model would have 
a. Biased coefficient 
b. Inconsistent coefficient 
c. Inefficient coefficient 
d. All of the above 
. One of these is NOT a test of specification errors 
a. Davidson—Mackinnon test 
b. Langrage multiplier test 
c. Durbin—Watson d statistics 
d. Ramsay’s RESET test 
_ If there are errors of measurement in the dependent variable Y, then the OLS estimators of this regression 
model would be 
a. Unbiased and efficient 
b. Unbiased but inefficient 
c. Biased and inefficient 
d. Biased but efficient 
_ If there are errors of measurement in the explanatory variable X, then the OLS estimators of the 
regression model would be 
a. Unbiased and consistent 
b. Biased but consistent 
c. Biased and inconsistent 
d. Unbiased but inconsistent 
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Given Model A: Y, = B, + B>X; + u; and Model B: Y; = B, + BX; + B3X3; + Up which of the following 
statements is TRUE? 

a. Model A and Model B are nested models. 

b. Model A is nested in Model B. 

c. Model B is nested in Model A. 

d. Model A and Model B are non-nested models. 


. Given Model A: Y; = y, + YX; + Uj and Model B: Y, = a, + @X>; + a3X3; + u; which of the following 


statements is TRUE? 

a. Model A and Model B are nested models. 

b. Model A is nested in Model B. 

c. Model B is nested in Model A. 

d. Model A and Model B are non-nested models. 
Given Model A: Y; = B, + B-Xz; + u; and Model B: Y, = 6, + BylnX,; + B3lnX3; + u; which of the 
following statements is TRUE? 

a. Model A and Model B are nested models. 

b. Model A is nested in Model B. 

c. Model B is nested in Model A. 

d. Model A and Model B are non-nested models. 
Given Model A: Y, = 64 + B>Z,; + u; and Model B: Y, = B, + B2X2; + B3X3; + u;, which of the following 
statements is TRUE? 

a. Model A and Model B are nested models. 

b. Model A is nested in Model B. 

c. Model B is nested in Model A. 

d. Model A and Model B are non-nested models. 
One of the important criteria to check for while choosing between two competing models based on the 
discrimination approach is that 

a. The regressors must be the same 

b. The regressand must be the same 

c. The number of regressors must be the same 

d. The number of observations must be the same 
Under the discerning approach, the choice of the reference hypothesis could determine the outcome of 
the choice model. This is more so if the competing regressors are 

a. Highly autocorrelated 

b. All equal to zero 

c. Correlated 

d. Incorrectly specified 
Which of the following statements with regard to R? is FALSE? 

a. Itis a measure of in-sample goodness of fit. 

b. In comparing two or more R’s, the regressand must be the same. 

c. For comparative purposes R? is better measure than adjusted R°. 

d. R° does not decrease when more explanatory variables are added to the model. 
According to Akaike’s Information Criterion (AIC), while comparing 2 or more models, that model is 
selected which has 

a. Lowest AIC value 

b. Highest AIC value 

c. AIC value > 1 

d. AIC value > 0 


lig. 
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Which of the following measures is not suitable for out-of-sample forecasting? 
a. AIC 
&. SIC 
Gok 
d. Allof the above 
An observation with a large residual is 
a. Leverage point 
b. Outlier 
c. Influence point 
d. Missing data 
A data point that is disproportionately distant from the bulk of the values of a regressor(s) is 
a. Leverage point 
b. Outlier 
c. Influence point 
d. Missing data 
Recursive Least Square is used to 
a. Test for maximizing R? value 
b. Test for specification bias 
c. Test for structural break in the sample data 
d. Test for out-of-sample prediction power of the model 
If errors are not normally distributed then the OLS estimators are 
a. Biased 
b. Non-linear 
c. Inefficient 
d. Still BLUE but t- and F-tests are invalid 
Bootstrapping technique is used to 
a. Test for specification bias 
b. Obtain the sampling distribution of parameters of interest 
c. Test for autocorrelation 
d. Test for normality of error term 
The OLS estimators are asymptotically normally distributed. As a result, the t- and F-tests are valid. 
This statement is 
a. True for very large samples 
b. True for small samples 
c. True irrespective of sample size 
d. False 
One of the assumptions of CLRM is that the regressors are fixed, but if we find our regressors to be 
random and distributed independently of the error terms, the OLD estimators are 
a. Biased but efficient 
b. Unbiased and efficient 
c. Unbiased but inefficient 
d. Biased and inefficient 
The consequence of including irrelevant variables in a model are more serious than omitting a relevant 
variable from the model. This statement is 
‘a. Is True 
b. Is False 
c. Depends on the nested model 
d. Depends on non-nested model 


540 Basic Econometrics 


Exercises 


Questions 


RTs 


132; 


1333: 


13.4. 


13.5. 


13163 


Refer to the demand function for potatoes estimated in Eq. (8.6.23). Considering the attributes of 
a good model discussed in Section 13.1, could you say that this demand function is “correctly” 
specified? 

Suppose that the true model is 


Y; = BX; + ui -@ 


but instead of fitting this regression through the origin you routinely fit the usual intercept-present 
model: 


Y; = do +X: + Vi (2) 


Assess the consequences of this specification error. l 

Continue with Exercise 13.2 but assume that it is model (2) that is the truth. Discuss the consequences 
of fitting the mis-specified model (1). 

Suppose that the “true” model is 


Y; = B + b2Xzi + u; (1) 
but we add an “irrelevant” variable X; to the model (irrelevant in the sense that the true 8, coefficient 
attached to the variable X, is zero) and estimate 


Y; = Bi + B.X2; + B3X3j + vy i (2) 
a. Would the R? and the adjusted R? for model (2) be larger than that for model (1)? 
b. Are the estimates of B, and B, obtained from model (2) unbiased? 
c. Does the inclusion of the “irrelevant” variable X, affect the variances of Ê; and A>? 
Consider the following “true” (Cobb-Douglas) production function: 


In Y; = æo + a ln Ly; + a2 nL; +03 In K; + u; 


where Y= output 
L, = production labor 
L, = nonproduction labor 
K = capital 
But suppose the regression actually used in empirical investigation is 


In Y; = Bo + By In Li; + Bo In K; +u; 


On the assumption that you have cross-sectional data on the relevant variables, 

a. Will E(B;) = a and E() = a3? 

b. Will the answer in (a) hold if it is known that L, is an irrelevant input in the production function? 
Show the necessary derivations. 

Refer to Egs. (13.3.4) and (13.3.5). As you can see, @, although biased. has a smaller variance than 

£2, which is unbiased. How would you decide on the trade-off between bias and smaller variance? 

Hint: The MSE (mean-square error) for the two estimators is expressed as 
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MSE(â2) = (07/') x3) + 6263, 


= sampling variance + square of bias 


MSE(A2) = 0?/') x3(1 —r3s) 


On MSE, see Appendix A. 
13.7. Show that 8 estimated from either Eq. (13.5.1) or Eq. (13.5.3) provides an unbiased estimate of true 
p: 


13.8. Following Friedman’s permanent income hypothesis, we may write 


ae (1) 
where Y* = “permanent” consumption expenditure and X; = “permanent” income. Instead of 
observing the “permanent” variables, we observe 

Y; = Yř + ui 
Xi = e + V; 


where Y, and X, are the quantities that can be observed or measured and where u, and v, are measurement 
errors in Y and X”, respectively. 
Using the observable quantities, we can write the consumption function as 


Y; =a + B(X; — vi) + uj 


=a + BX; + (u; — vi) l (2) 
Assuming that (1) E(u) = E(v,) = 0, (2) var (u;) = 9; and var (v) = 97, (3) cov (ae = U, 
cov (X* v,) = 0, and (4) cov (u,, X7) = cov (v, Y) = cov (u; v;) = 0, show that in large samples 
B estimated from Eq. (2) can be expressed as 


aa 
1+ (o7/0%-) 


a. What can you say about the nature of the bias in B? 

b. If the sample size increases indefinitely, will the estimated £ tend toward equality with the true B? 
13.9. Capital asset pricing model. The capital asset pricing model (CAPM) of modern investment theory 

postulates the following relationship between the average rate of return of a security (common stock), 

measured over a certain period, and the volatility of the security, called the beta coefficient (volatility 

is measure of risk): 


plim (Â) = 


R; = a + 02(B;) + u; (1) 


„where R; = average rate of return of security i 
B; = true beta coefficient of security i 
u; = stochastic disturbance term 
The true ĝ; is not directly observable but is measured as follows: 


Fie =O pom, E (2) 


where _r;, = rate of return of security i for time t 
rm, = market rate of return for time ¢ (this rate is the rate of return on some broad market 
index, such as the S&P index of industrial securities) 
e, = residual term 
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and where f’ is an estimate of the “true” beta coefficient. In practice, therefore, instead of estimating 
Eq. (1), one estimates 


R; = a + o2(B*) + ui (3) 


where £* are obtained from the regression (2). But since B; are estimated, the relationship between 
true 6 and 8” can be written as 


By = Bi + vi ; : (4) 


where v; can be called the error of measurement. 
a. What will be the effect’of this error of measurement on the estimate of œ? 
b: Will the a, estimated from Eq. (3) provide an unbiased estimate of true a? If not, is it a consistent 
estimate of œ? If not, what remedial measures do you suggest? 
13.10. Consider the model 


Y; = Bi + pX m: (1) 


To find out whether this model is mis-specified because it omits the variable X, from the model, you 
decide to regress the residuals obtained from model (1) on the variable X, only. (Note: There is an 
intercept in this regression.) The Lagrange multiplier (LM) test, however. requires you to regress 
the residuals from model (1) on both X, and X, and a constant. Why is your procedure likely to be 
inappropriate?” 

13.11. Consider the model 


Y; = By + Box; +: 


In practice we measure X* by X; such that 
a. X= X +5 
i De 
c. X;=(X7 + £;i), where g; is a purely random term with the usual properties What will be the effect 
of these measurement errors on estimates of true 8, and B,? 
13.12. Refer to the regression Eqs. (13.3.1) and (13.3.2). In a manner similar to Eq. (13.3.3) show that 


E(&1) = B; + B3(X3 — b32.X2) 
where b,, is the slope coefficient in the regression of the omitted variable X, on the included variable 
X. 
13.13. Critically evaluate the following view expressed by Leamer: 

My interest in metastatistics [i.e., theory of inference actually drawn from data] stems from my 
observations of economists at work. The opinion that econometric theory is irrelevant is held by an 
embarrassingly large share of the economic profession. The wide gap between econometric theory 
and econometric practice might be expected to cause professional tension. In fact, a calm equilibrium 
permeates our journals and our [professional] meetings. We comfortably divide ourselves into a 
celibate priesthood of statistical theorists, on the one hand, and a legion of inveterate sinner-data 
analysts, on the other. The priests are empowered to draw up lists of sins and are revered for the 


special talents they display. Sinners are not expected to avoid sins: they need only confess their errors 
openly. 


"See Maddala, op. cit., p. 477. 


tEdward E. Leamer, Specification Searches: Ad Hoc Inference with Nonexperimental Data, John Wiley & Sons, New York, 1978, 
p. vi. 
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13.14. Evaluate the following statement made by Henry Theil:” 


Given the present state of the art, the most sensible procedure is to interpret confidence coefficients and 
significance limits hberally when confidence intervals and test statistics are computed from the final 
regression of a regression strategy in the conventional way. That is, a 95 percent confidence coefficient 
may actually be an 80 percent confidence coefficient and a | percent significance level may actually be a 
10 percent level. 


13.15. Commenting on the econometric methodology practiced in the 1950s and early 1960s, Blaug stated: 


. much of it [1.e., empirical research] is like playing tennis with the net down: instead of attempting 
to refute testable predictions, modern economists all too frequently are satisfied to demonstrate that the 
real world conforms to their predictions, thus replacing falsification [à la Popper], which is difficult, with 
verification, which is easy. 


Do you agree with this view? You may want to peruse Blaug’s book to learn more about his views. 

13.16. According to Blaug. “There is no logic of proof but there is logic of disproof.”? What does he mean 
by this? 

13.17. Refer to the St. Louis model discussed in the text. Keeping in mind the problems associated with the 
nested F test, critically evaluate the results presented in regression (13.8.4). 

13.18. Suppose the true model is 


Yı = Bi + B2X; + BX} + BX? + ui 
but you estimate i 


Y; =a, +Q X; + vi 


If you use observations of Y at X = —3, -2, -1, 0, 1, 2, 3, and estimate the “incorrect” model, what bias 

will result in these estimates?® 
13.19. To see if the variable ne belongs in the model Y; = B,+ BX; + u;, Ramsey’s RESET test would estimate 
the linear model, obtaining the estimated Y, values from this model fi.e., Y,= pit ß-X;] and then 
estimating the model Y, = a, + a,X; + 3 fe ? + v, and testing the significance of a3. Prove that, if a3 
turns out to be statistically significant in the preceding (RESET) equation, it is the same thing as 
estimating the following model directly: Y; = By + 62X, + BX? + ui. (Hint: Substitute for Y, in the 


RESET regression.) 
13.20. State with reason whether the following statements are true or false." 
a. An observation can be influential but not an outlier. ` 


b. An observation can be an outlier but not influential. 

c. An observation can be both influential and an outlier. 

d. If inthe model Y; = B; + BX; + B3X? +u; Ê; turns out to be statistically insignificant, we should 
retain the linear term X; even if pp is statistically insignificant. 

e. If you estimate the model Y, = B, + B.X>; + B3X3; + u; or Y; = a + Boxz+ B3x3; + u; by OLS, the 
estimated regression line is the same, where x5; = (Xz, — X2) and x3; = (XG, — X3). 


“Henry Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, pp. 605-606. 

tM. Blaug, The Methodology of Economics. Or How Economists Explain, Cambridge University Press, New York, 1980, p. 256. 
tibid., p. 14. 

šAdapted from G. A. F., Sebeir, Linear Regression Analysis, John Wiley & Sons, New York, 1977, p. 176. 

“adapted from Kerry Peterson, op. cit., pp. 184-185. 

ttAdapted from Norman R. Draper and Harry Smith, op. cit., pp. 606-607. 
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Empirical Exercises 


1321 


1522. 


1529: 


13.24. 


Use the data for the demand for potatoes given in Exercise 7.19. Suppose you are told that the true 
demand function is 


In Y, = B,+ B, In X,; + B; In X3; + B; In Xs; + u; (1) 
but you think differently and estimate the following demand function: 


where Y= per capita consumption of potatoes in kg 
X, = Income per capita in thousand rupees at 1993—94 prices 
X3 = Price of potatoes in rupees per kg 
X; = Price of cauliflower in rupees per kg 
a. Carry out RESET and LM tests of specification errors, assuming that the demand function (1) just 
given is the truth. 
b. Suppose B, in Eq. (1) turns out to be statistically aiina Does that mean there is no speci- 
fication error if we fit Eq. (2) to the data? 
@ Ve B, turns out to be insignificant, does that mean one should not introduce the price of a substitute 
product(s) as an argument in the demand function? 
Continue with Exercise 13.21. Strictly for pedagogical purposes, assume that model (2) is the true 
demand function. í 
a. If we now estimate model (1), what type of specification error is committed in this instance? 
b. What are the theoretical consequences of this specification error? Illustrate with the data at hand. 
The true model is 


Y= Bi + oA +4; (1) 


but because of errors of measurement you estimate 
Y; =a) +02X; + vi (2) 


where Y; = Y* + s; and X; = X* + w;, where £; and w; are measurement errors. 

Using the data given in Table 13.2, document the consequences of estimating model (2) instead of the 
true model (1). 

Monte Carlo experiment.’ Ten individuals had weekly permanent income as follows: $200. 220, 240, 


260, 280, 300, 320, 340, 380, and 400. Permanent consumption (Y,”) was related to permanent income 
X7 as 


Y; = 0:84; = (1) 


Each of these individuals had transitory income equal to 100 times a random number u; drawn from a 

normal population with mean = 0 and ø? = 1 (i.e., standard normal variable). Assume that there is no 

transitory component in consumption. Thus, measured consumption and permanent consumption are 

the same. 

a. Draw 10 random numbers from a normal population with zero mean and unit variance and obtain 
10 numbers for measured income X; ( = X; + 100u;). 


“Adapted from Christopher Dougherty, Introduction to Econometrics, Oxford University Press, New York, 1992, pp. 253-256. 


13:25: 


13.26. 
1327: 


13.28. 


13.29. 


13.30. 


13:31. 
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b. Regress permanent (= measured) consumption on measured income using the data obtained in 
(a) and compare your results with those shown in Eq. (1). A priori, the intercept should be zero 
(why?). Is that the case? Why or why not? 

c. Repeat (a) 100 times and obtain 100 regressions as shown in (b) and compare your results with the 
true regression (1). What general conclusions do you draw? 

Refer to Exercise 8.26. With the definitions of the variables given there, consider the following two 

models to explain Y: 


Model A: Y, =a) + 2X3: +a3Xy + a4Xo + uy 
Model B: Y, = Bi + BoXa + B3X5: + BaXor + u; 


Using the nested F test, how will you choose between the two models? 

Continue with Exercise 13.25. Using the J test, how would you decide between the two models? 

Refer to Exercise 7.19, which is concerned with the demand for potatoes in India. There you were 

given four models. 

a, What is the difference between model 1 and model 2? If model 2 is correct and you estimate model 
1, what kind of error is committed? Which test would you apply—equation specification error or 
model selection error? Show the necessary calculations. 

b. Between models 1 and 4, which would you choose? Which test(s) do you use and why? 

Refer to Table 8.11, which gives data on gross domestic savings (Y) and personal disposable income 

(X) for the period 1974-75 to 2004-05. Now consider the following models: 


Model A: Y, = a; +a2X; +03X;_-1 +u: 
Model B: Y, = Bi + BX; + BsYi-1 + uy 


How would you choose between these two models? State clearly the test procedure(s) you use and 
show all the calculations. Suppose someone contends that the interest rate variable belongs in the 
savings function. How would you test this? Collect data on the commercial bank deposit rate for 
above 5 year time deposit as a proxy for the interest and demonstrate your answer. 

Use the data in Exercise 13.28.To familiarize yourself with recursive least squares, estimate the 

savings functions for 1974-75 to 1980-81, 1974-75 to 1985-86, 1974-75 to 1988-89 and 1974-75 

to 1995-96. Comment on the stability of estimated coefficients in the savings functions. 

Continue with Exercise 13.29, but now use the updated data in Table 8.11. 

a. Suppose you estimate the savings function for 1974-1975 to 1988-89. Using the parameters 
thus estimated and the personal disposable income data from 1989-90 to 1996-97 estimate the 
predicted savings for the latter period and use Chow’s prediction failure test to find out if it rejects 
the hypothesis that the savings function between the two time periods has not changed. 

b. Now estimate the savings function for the data from 1997—98 to 2004—05. Compare the results to 
the function for the 1989-90 to 1996—97 period using the same method as above (Chow’s prediction 
failure test). Is there a significant change in the savings function between the two periods? 

Omission of a variable in the K-variable regression model. Refer to Eq. (13.3.3), which shows the 

bias in omitting the variable X} from the model Y, = B,+ B,X>; + B3X3; + u, This can be generalized 

as follows: In the k-variable model Y, = B, + B-Xz; +-+ + By X;,; + uj, Suppose we omit the variable X,. 

Then it can be shown that the omitted variable bias of the slope coefficient of included variable X; is: 


E(B;) = Bj + Bb) j=2, 3,...,(k—1) 
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where b, is the (partial) slope coefficient of X; in the auxiliary regression of the excluded variable X; 
on all the explanatory variables included in the model. 

Refer to Exercise 13.21. Find out the bias of the coefficients in Eq. (1) if we excluded the variable 
In X; from the model. Is this exclusion serious? Show the necessary calculations. 


Key to Multiple Choice Questions 


1. (b) 2. (d) 3. (a) 4. (d) p. (Cc) 6. (a) Ua A) 8. (c) 9. (b) 
10. (b) 11. (d) 12. (d) 13. (b) 14. (c) 15. (c) 16. (a) 17. (c) 18. (b) 
19. (a) 20 RO 21. O A O aE) 24. (c) 25. (b) 


Appendix 13A 
13A.I The Proof that E(b,,) = B2 + B3b32 [Equation (13.3.3)] 


In the deviation form the three-variable population regression model can be written as 


Yi = Box2; + B3x3; + (u;i — ü) (1) 


First multiplying by x, and then by x3, the usual normal equations are 


YD yixai = po Ds + Bs So x23; ate X x2i(ui — ü) (2) 
$ yix = py > xxs + Bs Des + X x3i(ui — ii) (3) 


Dividing Eq. (2) by }_ x3, on both sides, we obtain 


> eae > Loma, XIU; — H) 
So ee a 4 
x5; j 2e 2 ® 
Now recalling that 
E y= $ vix2; 
a’ x3, v 
D Xi X3i 
b32 = 
a 2x3; 
Eq. (4) can be written as 
B do xz; (u; — ü) 
Ca a (5) 
Taking the expected value of Eq. (5) on both sides, we finally obtain 
E(b12) = Bo + B3b32 (6) 


where use is made of the facts that (a) for a given sample, bz, is a known fixed quantity, (b) B, and B; are 
constants, and (c) u; is uncorrelated with X,; (as well as X;,) l 


‘This can be generalized to the case where more than one relevant X variable is excluded from the model. On this, see 
Chandan Mukherjee et al., op. cit., p. 215. 
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13A.2 The Consequences of Including an Irrelevant Variable: 
The Unbiasedness Property 


For the true model (13.3.6), we have 


and we know that it is unbiased. 
For the model (13.3.7), we obtain 


(Zra) (£13) - (Eys)(Lon) 


Gy = 


2 (2) 
ee a > x2x3) 
Now the true model in deviation form is 
Yi = Box2 + (uj — ü) (3) 
Substituting for y, from model (3) into model (2) and simplifying, we obtain 
2 
ig = - (Exes) 
OO) oS 
bx x zi > x2x3) (4) 
= po 
that is, @ remains unbiased. 
We also obtain = 
(Er E) - (Eon) (De) s 
= = O E 
Ex Ea- (Lr) 
Substituting for y; from model (3) into model (5) and simplifying, we obtain 
nay [EED - EE 
w) = Pp ma a a 
LLa- (Lm) (6) 
=0 
which is its value in the true model since X, is absent from the true model. 
13A.3 The Proof of Equation (13.5.10) 
We have 
Jak, +w, | (2) 
Therefore, in deviation form we obtain 
yi = Bx} + (uj - ü) 8) 


S O W) (4) 
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Now when we use 
Y; =æ + BX; + ui (5) 


we obtain 


M 


yx 


2x? 


_ DL[Bx* + (u — u)lix* + (w—w)] 
a eC, 


B = 
using (3) and (4) 


Bx? +B Dx ww) + Dx*u-w) + Du - iw- w) 

5 dx? +25 x*(w—w) + D(w-w)? 
Since we cannot take expectation of this expression because the expectation of the ratio of two variables is not 
equal to the ratio of their expectations (Note: the expectations operator £ is a linear operator), first we divide 
each term of the numerator and the denominator by n and take the probability limit, plim (see Appendix A 
for details of plim), of 


(1/n) [B ox” +p E x*(w— W) + E x* (u — a) + Y (u — iw -— w)] 
(1/n) [Ex +2 E x*w- W) Ew OS W)?] 


Now the probability limit of the ratio of two variables is the ratio of their probability limits. Applying this rule 
and taking plim of each term, we obtain 


j= 


po. 


lim Ê = 
aed Cin +02 


where o2. and o2 are variances of X* and w as sample size increases indefinitely and where we have used the 
fact that as the sample size increases indefinitely there is no correlation between the errors u and w as well as 
between them and the true X*. From the preceding expression, we finally obtain 


nim’ =| 


which is the required result. 


13A.4 The Proof of Equation (13.6.2) 


Since there is no intercept in the model, the estimate of a, according to the formula for the regression through 
the origin, is as follows: 


gl 
"Eg a) 
Substituting for Y from the true model (13.2.8), we obtain 
g XXu) _ B Lo Xpui 
See (2) 
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Statistical theory shows that if In u; ~ N(0, a°) then 


u; = log normal [err e” (e"-")] (3) 
Therefore, 


= 


(: (Xu +Xur +- E 


2u.: 
e 


vx 


where use is made of the fact that the X’s are nonstochastic and each u; has an expected value of e”’/”. 
Since E(a@) # $, & is a biased estimator of £. 


TOPICS IN ECONOMETRICS 


In Part 1 we introduced the classical linear regression model with all its assumptions. In Part 2 we examined 
in detail the consequences that ensue when one or more of the assumptions are not satisfied and what can be 
done about them. In Part 3 we study some selected but commonly encountered econometric techniques. In 
particular, we discuss these topics: (1) nonlinear-in-the-parameter regression models, (2) qualitative response 
regression models, (3) panel data regression models, and (4) dynamic econometric models. 

In Chapter 14, we consider models that are intrinsically nonlinear in the parameters. With the ready avail- 
ability of software packages, it is no longer a big challenge to estimate such models. Although the underlying 
mathematics may elude some readers, the basic ideas of nonlinear-in-the-parameter regression models can 
be explained intuitively. With suitable examples, this chapter shows how such models are estimated and 
interpreted. i 

In Chapter 15, we consider regression models in which the dependent variable is qualitative in nature. This 
chapter therefore complements Chapter 9, where we discussed models in which the explanatory variables 
were qualitative in nature. The basic thrust of this chapter is on developing models in which the regressand is 
of the yes or no type. Since ordinary least squares (OLS) poses several problems in estimating such models, 
several alternatives have been developed. In this chapter we consider two such alternatives, namely, the logit 
model and the probit model. This chapter also discusses several variants of the qualitative response models, 
such as the Tobit model and the Poisson regression model. Several extensions of the qualitative response 
models are also briefly discussed, such as the ordered probit, ordered logit, and multinomial logit. 

In Chapter 16 we discuss panel data regression models. Such models combine time series and cross- 
section observations. Although by combining such observations we increase the sample size, panel data 
regression models pose several estimation challenges. In this chapter we discuss only the essentials of such 
models and guide the reader to the appropriate resources for further study. 

In Chapter 17, we consider regression models that include current as well as past, or lagged, values of 
the explanatory variables in addition to models that include the lagged value(s) of the dependent variable as 
one of the explanatory variables. These models are called, respectively, distributed lag and autoregressive 
models. Although such models are extremely useful in empirical econometrics, they pose some special 
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estimating problems because they violate one or more assumptions of the classical regression model. We 
consider these special problems in the context of the Koyck, the adaptive-expectations (AE), and the partial- 
adjustment models. We also note the criticism leveled against the AE model by the advocates of the so-called 
rational expectations (RE) school. 


CHAPTER 


14 


Nonlinear 
Regression Models 


The major emphasis of this book is on linear regression models, that is, models that are linear in the param- 
eters and/or models that can be transformed so that they are linear in the parameters. On occasions, however, for 
theoretical or empirical reasons we have to consider models that are nonlinear in the parameters.! In this chapter 
we take a look at such models and study their special features. 


14.1 Intrinsically Linear and Intrinsically Nonlinear Regression Models 


When we started our discussion of linear regression models in Chapter 2, we stated that our concern in this 
book is basically with models that are linear in the parameters: they may or may not be linear in the variables. 
If you refer to Table 2.3, you will see that a model that is linear in the parameters as well as the variables is 
a linear regression model and so is a model that is linear in the parameters but nonlinear in the variables. On 
the other hand, if a model is nonlinear in the parameters it is a nonlinear (in-the-parameter) regression model 
whether the variables of such a model are linear or not. 

However, one has to be careful here, for some models look nonlinear in the parameters but are inher- 
ently or intrinsically linear because with suitable transformation they can be made linear-in-the-parameter 
regression models. But if such models cannot be linearized in the parameters, they are called intrinsically 
nonlinear regression models. From now on when we talk about a nonlinear regression model, we mean that 
it is intrinsically nonlinear. For brevity, we will call them NLRM. 

To drive home the distinction between the two, let us revisit Exercises 2.6 and 2.7. In Exercise 2.6, Models 
a, b, c, and e are linear regression models because they are all linear in the parameters. Model d is a mixed 
bag. for B, is linear but not In B,. But if we let a = In £}, then this model is linear in œ and fy. 


'We noted in Chapter 4 that under the assumption of normally distributed error term, the OLS estimators are not only 
BLUE but are BUE (best unbiased estimator) in the entire class of estimators, linear or not. But if we drop the assumption of 
normality, as Davidson and MacKinnon note, it is possible to obtain nonlinear and/or biased estimators that may perform 
better than the OLS estimators. See Russell Davidson and James G. MacKinnon, Estimation and Inference in Econometrics, 
Oxford University Press, New York, 1993, p. 161. 


554 Basic Econometrics 


In Exercise 2.7, Models d and e are intrinsically nonlinear because there is no simple way to linearize 
them. Model c is obviously a linear regression model. What about Models a and b? Taking the logarithms on 
both sides of a, we obtain In Y,= 8,+ BX; + u; which is linear in the parameters. Hence Model a is intrinst- 
cally a linear regression model. Model b is an example of the logistic (probability) distribution function, 
and we will study this in Chapter 15. On the surface, it seems that this is a nonlinear regression model. But a 
simple mathematical trick will render it a linear regression model, namely, 


l= 
in( : ) Zg + pen a (14.1.1) 


i 


Therefore, Model b is intrinsically linear. We will see the utility of models like Eq. (14.1.1) in the next 
chapter. 

Consider now the famous Cobb-Douglas (C-D) production function. Letting Y = output, X, = labor 
input, and X, = capital input, we will write this function in three different ways: 


i Bits Ge — (14.1.2) 
Or, 
In Y; = a + f2 ln Xz; + p3 ln Xz; + ui (14.1.2a) 


where æ = In B,. Thus in this format the C—D function is intrinsically linear. 
Now consider this version of the C-D function: 


Fesp K, di (14.1.3) 
or, 


In Y; = œ + By In Xz; + f3 ln X3; + Inu; ` (14.1.3a) 


where æ = In B,. This model too is linear in the parameters. 
But now consider the following version of the C-D function: 


Y; = pı XË XË +u; (14.1.4) 


As we just noted, C-D versions (14.1.2a) and (14.1.3a) are intrinsically linear (in the parameter) regression 
models, but there is no way to transform Eq. (14.1.4) so that the transformed model can be made linear in the 
parameters.” Therefore, Eq. (14.1.4) is intrinsically a nonlinear regression model. 

Another well-known but intrinsically nonlinear function is the constant elasticity of Substitution (CES) 
production function of which the Cobb-Douglas production is a special case. The CES production takes the 
following form: 


n=A K aL A (14.1.5) 


where Y = output, K = capital input, L = labor input, A = scale parameter, ô = distribution parameter (0 < ô < 
1), and B = substitution parameter (B = —1).° No matter in what form you enter the stochastic error term u;in 
this production function, there is no way to make it a linear (in parameter) regression model. It is intrinsically 
a nonlinear regression model. 


7if you try to log-transform the model, it will not work because In (A + B) # In A + In B. 


3For properties of the CES production function, see Michael D. Intriligator, Ronald Bodkin, and Cheng Hsiao, Econometric 
Models, Techniques, and Applications, 2d ed., Prentice Hall, 1996, pp. 294-295. 
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14.2 Estimation of Linear and Nonlinear Regression Models 


To see the difference in estimating linear and nonlinear regression models, consider the following two models: 


Y; = Bi + BX; +u (14.2.1) 


Y; = Bye®*i + u; (14.2.2) 


By now you know that Eq. (14.2.1) is a linear regression model, whereas Eq. (14.2.2) is a nonlinear regression 
model. Regression (14.2.2) is known as the exponential regression model and is often used to measure the 
growth of a variable, such as population, GDP, or money supply. 

Suppose we consider estimating the parameters of the two models by ordinary least squares (OLS). In 
OLS we minimize the residual sum of squares (RSS), which for model (14.2.1) is: 


ie = -Ê - bx (14.2.3) 


where as usual Â; and Bo are the OLS estimators of the true B’s. Differentiating the preceding expression with 
respect to the two unknowns, we obtain the normal equations shown in Eqs. (3.1.4) and (3.1.5). Solving 
these equations simultaneously, we obtain the OLS estimators given in Eqs. (3.1.6) and (3.1.7). Observe very 
carefully that in these equations the unknowns (6's) are on the left-hand side and the knowns (X and Y) are on 
the right-hand side. As a result we get explicit solutions of the two unknowns in terms of our data. 

Now see what happens if we try to minimize the RSS of Eq. (14.2.2). As shown in Appendix 14A, Section 
14A.1, the normal equations corresponding to Eqs. (3.1.4) and (3.1.5) are as follows: 


YO Yeb = peh (14.2.4) 


YO rX = f,Y Xpe7AX (14.2.5) 


Unlike the normal equations in the case of the linear regression model, the normal equations for nonlinear 
regression have the unknowns (the ĝ’s) both on the left-and right-hand sides of the equations. As a conse- 
quence, we cannot obtain explicit solutions of the unknowns in terms of the known quantities. To put it 
differently, the unknowns are expressed in terms of themselves and the data! Therefore, although we can 
apply the method of least squares to estimate the parameters of the nonlinear regression models, we cannot 
obtain explicit solutions of the unknowns. Incidentally, OLS applied to a nonlinear regression model is called 
nonlinear least squares (NLLS). So, what is the solution? We take this question up next. 


14.3 Estimating Nonlinear Regression Models: The Trial-and-Error 
Method 


To set the stage, let us consider a concrete example. The data in Table 14.1 relates to the management fees 
that a leading mutual fund in the United States pays to its investment advisors to manage its assets. The fees 
paid depend on the net asset value of the fund. As you can see, the higher the net asset value of the fund, the 
lower are the advisory fees, which can be seen clearly from Figure 14.1. 
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Table 14.1 Advisory Fees Charged and Asset Size 0.56 
Fee, % Asset* : 0.52 
1 0.520 0.5 o 
2 0.508 5.0 . 
3 0.484 10 o 0-48 
4 0.46 15 $ F 
5 0.4398 20 2 
6 0.4238 m2 i 0.44 € 
7 0.4115 30 o 
8 0.402 35 e 
9 0.3944 40 an sa 
10 0.388 45 am. 
11 0.3825 55 ase = 
12 0.3738 60 . “o 10 20 30 40 50 60 70 


a P Asset, billions of dollars 
* i i . - . . 
Asset represents net asset value, billions of dollars. F e141 Relationship ata, dvisory fees and 
assets. 


To see how the exponential regression model in Eq. (14.2.2) fits the data given in Table 14.1. we can 
proceed by trial and error. Suppose we assume that initially 8, = 0.45 and $, = 0.01. These are pure guesses, 
sometimes based on prior experience or prior empirical work or obtained by just fitting a linear regression 
model even though it may not be appropriate. At this stage do not worry about how these values are obtained. 

Since we know the values of 8, and B, we can write Eq. (14.2.2) as: 


u; = Y; — pye®* = Y, —0.45e°°!% v (14.3.1) 
Therefore, 


aap ia (14.3.2) 


Since Y, X, B,, and B, are known, we can easily find the error sum of squares in Eq. (14.3.2).* Remember 
that in OLS our objective is to find those values of the unknown parameters that will make the error sum 
of squares as small as possible. This will happen if the estimated Y values from the model are as close as 
possible to the actual Y values. With the given values, we obtain X` u? = 0.3044. But how do we know that 
this is the least possible error sum of squares that we can obtain? What happens if you choose another value 
for B, and B,, say, 0.50 and —0.01, respectively? Repeating the procedure just laid down, we find that we now 
obtain >> ua = 0.0073. Obviously, this error sum of squares is much smaller than the one obtained before. 
namely, 0.3044. But how do we know that we have reached the lowest possible error sum of squares, if by 
choosing yet another set of values for the B’s, we will obtain yet another error sum of squares? 

As you can see, such a trial-and-error, or iterative, process can be easily implemented. And if one has 
infinite time and infinite patience, the trial-and-error process may ultimately produce values of 8, and B, 
that may guarantee the lowest possible error sum of squares. But you might ask, how did we go from (8; = 
0.45; B, = 0.01) to (B, = 0.50; B, = —0.01)? Clearly, we need some kind of algorithm that will tell us how we 
go from one set of values of the unknowns to another set before we stop. Fortunately such algorithms are 
available, and we discuss them in the next section. 


4 2 f 
Note that we call 2- 4; the error sum of squares and not the usual residual sum of squares because the values of the 
parameters are assumed to be known. 
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14.4 Approaches to Estimating Nonlinear Regression Models 


There are several approaches, or algorithms, to NLRMs: (1) direct search or trial and error, (2) direct optimi- 
zation, and (3) iterative linearization. 


Direct Search or Trial-and-Error or Derivative-Free Method 


In the previous section we showed how this method works. Although intuitively appealing because it does 
not require the use of calculus methods as the other methods do, this method is generally not used. First, if an 
NLRM involves several parameters. the method becomes very cumbersome and computationally expensive. 
For example, if an NLRM involves 5 parameters and 25 alternative values for each parameter are considered, 
you will have to compute the error sum of squares (25)° = 9,765,625 times! Second, there is no guarantee 
that the final set of parameter values you have selected will necessarily give you the absolute minimum error 
sum of squares. In the language of calculus, you may obtain a local and not an absolute minimum. In fact, no 
method guarantees a global minimum. 


Direct Optimization 


In direct optimization we differentiate the error sum of squares with respect to each unknown coefficient, or 
parameter, set the resulting equation to zero, and solve the resulting normal equations simultaneously. We 
have already seen this in Eqs. (14.2.4) and (14.2.5). But as you can see from these equations, they cannot 
be solved explicitly or analytically. Some iterative routine is therefore called for. One routine is called the 
method of steepest descent. We will not discuss the technical details of this method as they are somewhat 
involved, but the reader can find the details in the references. Like the method of trial and error, the method 
of steepest descent also involves selecting initial trial values of the unknown parameters but then it proceeds 
more systematically than the hit-or-miss or trial-and-error method. One disadvantage of this method 1s that it 
may converge to the final values of the parameters extremely slowly. 


Iterative Linearization Method 


In this method we linearize a nonlinear equation around some initial values of the parameters. The linearized 
equation is then estimated by OLS and the initially chosen values are adjusted. These adjusted values are used 
to relinearize the model, and again we estimate it by OLS and readjust the estimated values. This process 
is continued until there is no substantial change in the estimated values from the last couple of iterations. 
The main technique used in linearizing a nonlinear equation is the Taylor series expansion from calculus. 
Rudimentary details of this method are given in Appendix 14A, Section 14A.2. Estimating NLRM using 
Taylor series expansion is systematized in two algorithms, known as the Gauss—Newton iterative method 
and the Newton—Raphson iterative method. Since one or both of these methods are now incorporated in 
several computer packages, and since a discussion of their technical details will take us far beyond the scope 
of this book, there is no need to dwell on them here. In the next section we discuss some examples using 
these methods. 


5The following discussion leans heavily on these sources: Robert S. Pindyck and Daniel L. Rubinfeld, Econometric Models and 
Economic Forecasts, 4th ed., McGraw-Hill, 1998, Chapter 10; Norman R. Draper and Harry Smith, Applied Regression Analy- 
sis, 3d ed., John Wiley & Sons, 1998, Chapter 24; Arthur S. Goldberger, A Course in Econometrics, Harvard University Press, 
1991, Chapter 29; Russell Davidson and james MacKinnon, op. cit., pp. 201-207; John Fox, Applied Regression Analysis, 
Linear Models, and Related Methods, Sage Publications, 1997, pp. 393—400; and Ronald Gallant, Nonlinear Statistical Models, 
John Wiley and Sons, 1987. 

There is another method that is sometimes used, called the Marquard method, which is a compromise between the 
method of steepest descent and the linearization (or Taylor series) method. The interested reader may consult the references 
for the details of this method. 
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14.5 Illustrative Examples 


Example 14.1 Mutual Fund Advisory Fees 


Refer to the data given in Table 14.1 and the NLRM (14.2.2). Using the F Views 6 nonlinear regression routine, 
which uses the linearization method,’ we obtained the following regression results; the coefficients, their 
standard errors, and their t values are given in a tabular form: 


Variable Coefficient Std. Error t Value p Value 
Intercept 0.5089 0.0074 68.2246 0.0000 
Asset — —0.0059 0.00048 —12.3150 0.0000 


R? = 0.9385 d = 0.3493 
From these results, we can write the estimated model as: 


Fee; = 0.5089 Asset~°.059 (14.5.1) 


Before we discuss these results, it may be noted that if you do not supply the initial values of the parameters 
to start the linearization process, EViews will do it on its own. It took EViews five iterations to obtain the results 
shown in Eq. (14.5.1). However, you can supply your own initial values to start the process. To demonstrate, 
we chose the initial value of 6, = 0.45 and £, = 0.01. We obtained the same results as in Eq. (14.5.1) but it 
took eight iterations. It is important to note that fewer iterations will be required if your initial values are not very 
far from the final values. In some cases you can choose the initial values of the parameters by simply running 
an OLS regression of the regressand on the regressor(s), simply ignoring the nonlinearities. For instance, using 
the data in Table 14.1, if you were to regress fee on assets, the OLS estimate of 8, is 0.5028 and that of 8; is 
—0.002, which are much closer to the final values given in Eq. (14.5.1). (For the technical details, see Appendix 
14A, Section 14A.3.) : Ñ 

Now about the properties of nonlinear least squares (NLLS) estimators. You may recall that, in the case 
of linear regression models with normally distributed error terms, we were able to develop exact inference 
procedures (i.e., test hypotheses) using the t, F, and x? tests in small as well as large samples. Unfortunately, 
this is not the case with NLRMs, even with normally distributed error terms. The NLLS estimators are not 
normally distributed, are not unbiased, and do not have minimum variance infinite, or small, samples. As a result, 
we cannot use the t test (to test the significance of an individual coefficient) or the F test (to test the overall 
significance of the estimated regression) because we cannot obtain an unbiased estimate of the error variance 
o* from the estimated residuals. Furthermore, the residuals (the difference between the actual Y values and 
the estimated Y values from the NLRM) do not necessarily sum to zero, ESS and RSS do not necessarily add 
up to the TSS, and therefore R? = ESS/TSS may not be a meaningful descriptive statistic for such models. 
However, we can compute R? as: 

2 2ú; 


E SETA: 


where Y = regressand and û; = Y; — Y;, whereř; are the estimated Y values from the (fitted) NLRM. 

Consequently, inferences about the regression parameters in nonlinear regression are usually based 
on large-sample theory. This theory tells us that the least-squares and maximum likelihood estimators for 
nonlinear regression models with normal error terms, when the sample size is large, are approximately 
normally distributed and almost unbiased, and have almost minimum variance. This large-sample theory also 
applies when the error terms are not normally distributed.® 


(14.5.2) 


”EViews provides three options: quadratic hill climbing, Newton-Raphson, and Berndt-Hall-Hall-Hausman. The default op- 
tion is quadratic hill climbing, which is a variation of the Newton-Raphson method. 


8iohn Neter, Michael H. Kutner, Christopher J. Nachtsheim, and William Wasserman, Applied Regression Analysis, 3d ed., 
Irwin, 1996, pp. 548-549. 
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In short, then, all inference procedures in NLRM are large sample, or asymptotic. Returning to Example 
14.1, the t statistics given in Eq. (14.5.1) are meaningful only if interpreted in the large-sample context. In that 
sense, we Can say that estimated coefficients shown in Eq. (14.5.1) are individually statistically significant. Of 
course, our sample in the present instance is rather small. 

Returning to Eq. (14.5.1), how do we find out the rate of change of Y( = fee) with respect to X (asset size)? 
Using the basic rules of derivatives, the reader can see that the rate of change of Y with respect to X is: 

dY 


gx = P Pze” = (—0.0059)(0.5089)e 9.057% (14.5.3) 


As can be seen, the rate of change of fee depends on the value of the assets. For example, if X = 20 (million), the 
expected rate of change in the fees charged can be seen from Eq. (14.5.3) to be about -0.0031 percent. Of course, 
this answer will change depending on the X value used in the computation. Judged by the R? as computed from 
Eq. (14.5.2), the R? value of 0.9385 suggests that the chosen NLRM fits the data in Table 14.1 quite well. 
The estimated Durbin—Watson value of 0.3493 may suggest that there is autocorrelation or possibly model 
specification error. Although there are procedures to take care of these problems as well as the problem of 
heteroscedasticity in NLRM, we will not pursue these topics here. The interested reader may consult the refer- 
ences. 


Example 14.2 The Cobb-Douglas Production Function of the Mexican Economy 


Refer to the data given in Exercise 14.9 (Table 14.3). These data refer to the Mexican economy for years 
1955-1974. We will see if the NLRM given in Eq. (14.1.4) fits the data, noting that Y = output, X, = labor 
input, and X, = capital input. Using EViews 6, we obtained the following regression results, after 32 iterations. 


Variable Coefficient Std. Error t Value p Value 
Intercept 0.5292 0.2712 1.9511 0.0677 
Labor 0.1810 0.1412 1.2814 0.2173 

0.0000 


Capital 0.8827 0.0708 12.4658 


R? = 0.9942 d= 0.2899 
Therefore, the estimated Cobb-Douglas production function is: 
GDP; = 0.5292Labor? 181° Capital®-8827 (14.5.4) 


interpreted asymptotically, the equation shows that only the coefficient of the capital input is significant in this 
model. In Exercise 14.9 you are asked to compare these results with those obtained from the multiplicative 
Cobb-Douglas production function as given in Eq. (14.1.2). 


Example 14.3 Growth of U.S. Population, 1970-2007 


The Table in Exercise 14.8 gives data on total U.S. population for the period 1970-2007. A logistic model 
of the following type is often used to measure the growth of some populations, human beings, bacteria, etc.: 


By 


tO mera) 


(14.5.5) 
where Y= population, in millions; t = time, measured chronologically; and the 8's are the parameters. 

This model is nonlinear in the parameters; there is no simple way to convert it into a model that is linear in 
the parameters. So we will need to use one of the nonlinear estimation methods to estimate the parameters. 
Notice an interesting feature of this model: Although there are only two variables in the model, population 
and time, there are three unknown parameters, which shows that in a NLRM there can be more parameters 


than variables. 
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An attempt to fit Eq. (14.5.5) to our data was not successful, as all the estimated coefficients were 
statistically insignificant. This is probably not surprising, for if we plot population against time, we obtain 
Figure 14.2. 
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Figure 14.2 Population versus Year. 


This figure shows that there is an almost linear relationship between the two variables. If we plot the logarithm 
of population against time, we obtain the following figure: 
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Figure 14.3 Logarithm of Population versus Year. 


The slope of this figure (multiplied by 100) gives us the growth rate of population (why?). 
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As a matter of fact, if we regress the log of population on time, we get the following results: 


Dependent Variable: LPOPULATION 
Method: Least Squares 

Sample: 1970-2007 

Included observations: 38 


Coefficient Stawennr ox 


t-Statistic Prob. 

@ -8.710413 0.147737 -58.95892 0.0000 
YEAR 0.010628 7.43E-05 143.0568 0.0000 
R-squared 0.998244 Mean dependent var. w2 A2405 
Adjusted R-squared 0.998195 S.D. dependent var. on EIS 27 
S.E. of regression 0.005022 Akaike info criterion -7.698713 
Sum squared resid. 0.000908 Schwarz criterion -7.612525 
Log likelihood 148.2756 Hannan-Quinn criter. -7.668048 
F-statistic 20465.26 Durbin-Watson stat. 0.366006 


Prob. (F-statistic) 0.000000 


This table shows that, over the period 1970-2007, the U.S. population has been growing at the rate of about 
1.06 percent per year. The R? value of 0.998 suggests that there is almost a perfect fit. 

This example brings out an important point that sometimes a linear (in the parameter) model might be 
preferable to a nonlinear (in the parameter) model. 


Example 14.4 Box-Cox Transformation: U.S. Population 1970-2007 


In Appendix 6A.5 we briefly considered the Box-Cox transformation. Let us continue with Example 14.3 but 
assume the following model: 
Population’ = £1 + B2 Year + u 


As noted in Appendix 6A.5, depending on the value of à we have the following possibilities: 


Value of à Model 
1 
= a Sea Year 
i Population Paes E Y 
0 In Population = fı + £2 Year + u 
1 Population; = 61 + 62 Year + u 


The first is an inverse model, the second is a semilog model (which we have already estimated in Example 


14.3), and the third is a linear (in the variables) model. 
Which of these models is appropriate for the population data? The Box-Cox routine in STATA (Version 10) 


can be used to answer this question: 


Test Restricted LR statistic p-value 

HO: Log likelihood chi? Prob > chi? 
6=-1 —444.42475 0.14 0.707 

at) —444.38813 0.07 0.794 


o= —444.7 5684 0.81 0.369 
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Note: In our notation, theta (6) is the same thing as lamda (A). The table shows that on the basis of the 
likelihood ratio (LR) test, we cannot reject any of these A values as possible values for power of population; that 
is, in the present example, linear, inverse and semilog models are equal candidates to depict the behavior of 
population over the sample period 1970-2007. Therefore, we present the results of all three models: 


Dependent variable Intercept Slope R? 

1/Population 0.000089 —4.28e-08 0.9986 
t (166.14) (—1568.10) | 

In oraaa —8.7104 0.0106 0.9982 
t (—58.96) (143.06) 

Population —5042627 2661.825 0.9928 


t (—66.92) (70.24) 


In all of these models the estimated coefficients are all highly statistically significant. But note that the R? values 
are not directly comparable because the dependent variables in the three models are different. 
This example shows how nonlinear estimation techniques can be applied in concrete situations. 


Summary and Conclusions 


The main points discussed in this chapter can be summarized as follows: 


ilk 


2 


Although linear regression models predominate theory and practice, there are occasions where 
nonlinear-in-the-parameter regression models (NLRM) are useful. 

The mathematics underlying linear regression models is comparatively simple in that one can obtain 
explicit, or analytical, solutions of the coefficients of such models. The small-sample and large-sample 
theory of inference of such models is well established. 


. Incontrast, for intrinsically nonlinear regression models (NLRM), parameter values cannot be obtained 


explicitly. They have to be estimated numerically, that is, by iterative procedures. 

There are several methods of obtaining estimates of NLRMs, such as (1) trial and error, (2) nonlinear 
least squares (NLLS), and (3) linearization through Taylor series expansion. 

Computer packages now have built-in routines, such as Gauss-Newton, Newton—Raphson, and 
Marquard. These are all iterative routines. 

NLLS estimators do not possess optimal properties in finite samples, but in large samples they do have 
such properties. Therefore, the results of NLLS in small samples must be interpreted carefully. 
Autocorrelation, heteroscedasticity, and model specification problems can plague NLRM, as they do 
linear regression models. 

We illustrated the NLLS with several examples. With the ready availability of user- friendly software 
packages, estimation of NLRM should no longer be a mystery. Therefore, the reader should not shy 
away from such models whenever theoretical or practical reasons dictate their use. As a matter of fact, 
if you refer to Exercise 12.10, you will see from Eq. (1) that it is intrinsically a nonlinear regression 
model that should be estimated as such. 
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Multiple Choice Questions 


. Nonlinear regression models are 

a. Linear in parameter and variables 

b. Nonlinear in parameter and variables 

c. Linear in parameter and may/may not be linear in variables 

d. Nonlinear in parameter and may/may not be linear in variables 
. The regression model Y, =ef *8:*.*" is 

a. Intrinsically linear mode} 

b. Intrinsically nonlinear model 

c. Taylor series 

d. Nonlinear in variables 


. The regression model in} 


a. Intrinsically linear model 

b. Intrinsically nonlinear model 

c. Taylor series 

d. Nonlinear in variables 

. The regression model Y, = B, + B;X; + u; is 


a. Intrinsically linear model 

b. Intrinsically nonlinear model 

c. Taylor series í 

d. Nonlinear in variables 

. The regression model Y, = p, X f Xa emis 
a. Intrinsically linear model 

b. Intrinsically nonlinear model 

c. Taylor series 

d. Nonlinear in variables 


_ The regression model Y, = B,X# Xe" is 

a. Logistic (probability) distribution function 

b. Cobb—Douglas production function 

c. Constant elasticity of substitution production function 


d. Exponential regression model 

_ The regression model Y, = B,e”"' +u is 

a. Logistic (probability) distribution function 

b. Cobb-Douglas production function 

c. Constant elasticity of substitution production function 
d. Exponential regression model 

_ The regression model Y, = A[8K7’ +(0-6)L,°y"? is 

a. Logistic (probability) distribution function 

b. Cobb-Douglas production function 
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c. Constant elasticity of substitution production function 
d. Exponential regression model 


9. The regression model Y, = == is 
a. Logistic (probability) distribution function 
b. Cobb-Douglas production function 
c. Constant elasticity of substitution production function 
d. Exponential regression model 
10. Which of these is surely not‘the estimation procedure for NURM 
a. Derivative free method 
b. Method of steepest descent 
c. Ordinary least square method 
d. Iterative linearization method 
11. Estimating NLRM using iterative methods result in 
a. Absolute minimum 
b. Local minimum 
c. Global minimum 
d. Always BUE estimators 
12. The iterative process of estimating NLRM guarantees estimates that result in least error sum of squares. 
This statement 
a. True 
b. False 
c. Depends on initial values of parameters selected 
d. Depends on the number of parameters 
13. The iterative methods used to linearize a nonlinear model is 
a. Derivative free method and Method of steepest descent 
b. Method of steepest descent and Gauss—Newton iterative method 
c. Gauss—Newton iterative method and Newton—Raphson iterative method 
d. Newton-Raphson iterative method and direct optimization method 
14. The linearization of a nonlinear equation is based on the technique of 
a. Direct search method 
b. Taylor series expansion 
c. Method of steepest descent 
d. Hit-or-miss method 
15. The nonlinear least squares estimators possess optimal properties in 
Finite samples 
b. Large samples 
c. Both small and large samples 
d. Neither in finite nor large samples 


R 
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Exercises 


Questions 


14.1. What is meant by intrinsically linear and intrinsically nonlinear regression models? Give some 
examples. 

14.2. Since the error term in the Cobb-Douglas production function can be entered multiplicatively or 
additively, how would you decide between the two? 

14.3. What is the difference between OLS and nonlinear least-squares (NLLS) estimation? 

14.4. The relationship between pressure and temperature in saturated steam can be expressed as: 


H= By (10) 224/49) ei, 


where Y = pressure and t = temperature. Using the method of nonlinear least squares (NLLS), obtain 
the normal equations for this model. 

14.5. State whether the following statements are true or false. Give your reasoning. 

a. Statistical inference in NLLS regression cannot be made on the basis of the usual t, F, and x° tests 
even if the error term is assumed to be normally distributed. 
b. The coefficient of determination (R°) is not a particularly meaningful number for an NLRM. 

14.6. How would you linearize the CES production function discussed in the chapter? Show the necessary 
steps. 

14.7. Models that describe the behavior of a variable over time are called growth models. Such models are 
used in a variety of fields, such as economics, biology, botany, ecology, and demography. Growth 
models can take a variety of forms, both linear and nonlinear. Consider the following models, where 
Y is the variable whose growth we want to measure; f is time, measured chronologically; and u, is the 
stochastic error term. 

a. Y,= B, + Bot + u, 
b. In Y,= B, + Bot + u, 
c. Logistic growth model: Y, = rrim +u; 


d. Gompertz growth model: Y, = pie be +u, 
Find out the properties of these models by considering the growth of Y in relation to time. 


Empirical Exercises 


14.8. The data in Table 14.2 gives U.S. population, in millions of persons, for the period 1970-2007. Fit 
the growth models given in Exercise 14.7 and decide which model gives a better fit. Interpret the 
parameters of the model. 

14.9. Table 14.3 gives data on real GDP, labor, and capital for Mexico for the period 1955-1974. See if 
the multiplicative Cobb-Douglas production function given in Eq. (14.1.2a) fits these data. Compare 
your results with those obtained from fitting the additive Cobb-Douglas production function given in 
Eq. (14.1.4), whose results are given in Example 14.2. Which is a better fit? 


“Adapted from Draper and Smith, op. cit., p. 554. 
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Table 14.2 U.S. Population (Millions) 


Year 


1970 
1971 
1972 
1973 
1974 
1975 
1976 
1977 
1978 
1979 
1980 
1981 
1982 
1983 
1984 
1985 
1986 
1987 
1988 


Population 


205,052 
207,661 
209,896 
2117909 
213,854 
215,973 
218,035 ` 
220,239 
222,585 
225,055 
227,726 
229,966 
232,188 
234,307 
236,348 
238,466 
240,651 
242,804 
245,021 


Year 


1989 
1990 
1991 
11992 
19983 
1994 
1995 
1996 
N9997 
1998 
1999 
2000 
2001 
2002 
2003 
2004 
2005 
2006 
2007 


Population 


247,342 
250,132 
253,493 
256,894 
260,255 
263,436 
266,557 
269,667 
272,912 
276,115 
2799295 
282,407 
285,339 
288,189 
290,941 
293,609 
299,801 
299,157 
302,405 


Source: Economic Report of the President, 2008. 


Table 14.3 Production Function Data for the Mexican Economy 


Observation 


1955 
1956 
1957 
1958 
1959 
1960 
1961 
1962 
1963 
1964 


GDP 


114,043 
120,410 
129,187 
134,705 
139,960 
150,511 
157,897 
165,286 
178,491 


199,457 


Notes: GDP is in millions of 1960 pesos. 
Labor is in thousands of people. 
Capital is in millions of 1960 pesos. 


Labor 


8,310 
8,529 
8,738 
8,952 
Ba 

9,569 
9,527 
9,662 
10,334 


10,981 


Capital 
182,113 
193,749 
205,192 
215,130 
225,021 
237,026 
248,897 
260,661 
275,466 


295,378 


Observation 


1965 
1966 
1967 
1968 
1969 
1970 
1971 
1972 
TIS 
1974 


GDP 


212,323 
226,977 
241,194 
260,881 
277,498 
296,530 
306,712 
329,030 
354,057 
374,977 


Labor 


11,746 
11,521 
11,540 
12,066 
12,297 
W29955 
1¥,338 
13,738 
15,924 
14,154 


Capital 
3957715 
337,642 
363,599 
391,847 
422,382 
455,049 
484,677 
$20,553 
561,531 
609,825 


Source: Victor J. Elias, Sources of Growth: A Study of Seven Latin American Economies, International Center for Economic Growth, ICS Press, San Francisco, 1992, 


Tables E-5, E-12, E-14. 


1. (d) 


10. (c) 11. (b) 


Key to Multiple Choice Questions 


2. (a) 


3. (a) 


12. (a) 


4. (b) 
ee K 


5. (b) 
14. (b) 


6. (b) 
15. (b) 


7. (d) 


8. (c) 


9. (a) 
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Appendix 14A 


14A.1 Derivation of Equations (14.2.4) and (14.2.5) 


Write Eq. (14.2.2) as 
uUi = Y; = pie” *i (1) 


E = Le -pehy (2) 


The error sum of squares is thus a function of 8, and £, since the values of Y and X are known. Therefore, to minimize 
the error sum of squares, we have to partially differentiate it with respect to the two unknowns, which gives: 


Therefore, 


2 
rn = 20% — pie” ži) (—1e®*i) (3) 
ayo u? 
>s = Da = ByeP2%i) (—B e°2%: X;) (4) 


By the first-order condition of optimization, setting the preceding equations to zero and solving them simultaneously, 
we obtain Eqs. (14.2.4) and (14.2.5). Note that in differentiating the error sum of squares we have used the chain rule. 


14A.2 The Linearization Method 


Students familiar with calculus will recall Taylor’s theorem, which states that any arbitrary function KRX) that is 
continuous and has a continuous nth-order derivative can be approximated around point X = Xp by a polynomial function 
and a remainder as follows: 

Xi '(Xo(X — X "(Xo X — Xo)" 
SXo) , FN o) ON U 


T 1! 2! 


n n 

+ i ror» LR (1) 
where f’(X,) is the first derivative of AX) evaluated at X = Xo, f"(Xo) is the second derivative of AX) evaluated at X = Xp 
and so on, where n! (read n factorial) stands for n(n — 1)(n -2).. 1 with the convention that 0! = 1, and R stands for the 
remainder. If we take n = 1, we get a linear approximation; choosing n = 2, we get a second-degree polynomial approxi- 
mation. As you can expect, the higher the order of the polynomial, the better the approximation to the original function. 
The series given in Eq. (1) is called Taylor’s series expansion of f(X) around the point X = Xo. As an example, consider 
the function: 

i= ACGIH +X + 03.X* aXe 


Suppose we want to approximate it at X = 0. We now obtain: 
fO=m f(O)=e. f"(0)=203  f”(0)= bas 
Hence we can obtain the following approximations: 


: "(0 ; 
First order: Y=a,+ L0) = a + a2 X + remainder (= az X? + a4X°) 


Second order: Y = f(0) + Oyra Oy 


= ay +a2X +a3X? + remainder (= a4X°) 
Third order: Y =a; +02X +03X2 + 04X? 


The third-order approximation reproduces the original equation exactly. 
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The objective of Taylor series approximation is usually to choose a lower-order polynomial in the hope that the 
remainder term will be inconsequential. It is often used to approximate a nonlinear function by a linear function, by 


dropping the higher-order terms. 
The Taylor series approximation can be easily extended to a function containing more than one X. For example, 


consider the following function: 
Y = f(X, Z) (2) 
and suppose we want to expand it around X = a and Z = b. Taylor’s theorem shows that 
f(x,z)= f(a, b)+ fx(a, b)\(x — a) 


`+ fla, b) fE — b) + 5 Lexa, bE a)? 


— 2 fr2(a, bx — a)(z — b) + fzz(a, b)(z — b} ] +- (3) 


where f, = partial derivative of the function with respect to (w.r.t.) X, fı = second partial derivative of the function w.r.t. 
X and similarly for the variable Z. If we want a linear approximation to the function, we will use the first two terms in 
Eq. (3), if we want a quadratic, or second-degree, approximation, we will use the first three terms in Eq. (3), and so on. 


14A.3 Linear Approximation of the Exponential Function Given in 
Equation (14.2.2) 


The function under consideration is: 


Y = f(Bi, fa) pieni (1) 
Note: For ease of manipulation, we have dropped the observation subscript. 
Remember that in this function the unknowns are the 8 coefficients. Let us linearize this function at 61 = £f and 62 = 63, 
where the starred quantities are given fixed values. To linearize this, we proceed as follows: 


Y = f(B1, P2) = (Bi, P3) + fo, (Bi, Bz (Bi — Bi) + fe (Pi. B3)(B2— P3) (2) 


where fp, and fp, are the partial derivatives of the function (1) with respect to the unknowns and these derivatives will be 
evaluated at the (assumed) starred values of the unknown parameters. Note that we are using only the first derivatives in 
the preceding expression, since we are linearizing the function. Now assume that £f = 0.45 and 6} = 0.01, which are pure 
guess—estimates of the true coefficients. 
Now 

S (Bt = 0.45, BF = 0.01) = 0.45299 % 


. fp =e ad fp = Bi Xje%i (3) 
by the standard rules of differentiation. Evaluating these derivatives at the given values and reverting to Eq. (2). we obtain: 
Y; = 0.45e°X: 4 ¢-1%i(g, — 0.45) + (0.45) Xje°°l%1 (B, — 0.01) (4) 
which we write as: 
(Y; as 0.45e°91% ) = e? 0i gy, F 0.45X; e0- a (5) 
where i 
a) = (fı — 0.45) and a2 = (f2 — 0.01) g (6) 


Now let Y* = (Y; — 0.45e°9%), X, = e947, and Xz, = 0.45X;e°'*:, Using these definitions and adding the error term Uj, 
we can finally write Eq. (5) as: 


Yf = 0 Xu +.02X2; +u; (7) 


Lo and behold, we now have a linear regression model. Since Y;*. X,, and X,; can be readily computed from the data, we 
can easily estimate Eq. (7) by OLS and obtain the values of a, and a, Then, from Eq. (6). we obtain: 


pi =a, +0.45 . and Bz = a2 + 0.01 (8) 
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Call these values £f* and £3*, respectively. Using these (revised) values, we can start the iterative process given in Eq. 
(2), obtaining yet another set of values of the B coefficients. We can go on iterating (or linearizing) in this fashion until 
there is no substantial change in the values of the 6 coefficients. In Example 14.1, it took five iterations, but for the 
Mexican Cobb-Douglas example (Example 14.2), it took 32 iterations. But the underlying logic behind these iterations 
is the procedure just illustrated. 

For the mutual fund fee structure example in Section 14.3, the Y¥*, X ; and X, as given in Eq. (6) are as shown in Table 
14.4; the basic data are given in Table 14.1. From these values, the regression results corresponding to Eq. (7) are: 


Dependent variable: y* 
Method: Least squares 


Variable Coefficient Std. Error t-Statistic Prob. 
Xı 0.022739 0.014126 1.609705 01385 


X2 -0010693 0.000790 z137 52990 0.0000 


R? = 0.968324 Durbin-Watson d statistic = 0.308883 


Now using Eq. (8), the reader can verify that 


By = 0.4727 and pł = —0.00069 (9) 
Table 14.4 
ys Xı X2 

0.067744 1.005013 0.226128 
0.034928 1.051271 2.365360 
—0.013327 1.105171 ‘ 4.973269 
—0.062825 1.161834 7.842381 
—0.109831 1.221403 10.99262 
—0.154011 1.284025 14.44529 
—0.195936 1.349859 18.22309 
—0.236580 1.419068 22.35031 
—0.276921 1.491825 26.85284 
—0.317740 1.568312 31.75832 
—0.397464 1.733253 42.89801 
—0.446153 1.822119 49.19721 


Contrast these numbers with the initial guesses of 0.45 and 0.01, respectively, for the two parameters. Using the new 
estimates given in Eq. (9), you can start the iterative procedure once more and go on iterating until there is “convergence” 
in the sense that the final round of the estimates does not differ much from the round before that. Of course, you will 
require fewer iterations if your initial guess is closer to the final values. Also, notice that we have used only the linear term 
in Taylor’s series expansion. If you were to use the quadratic or higher-order terms in the expansion, perhaps you would 
reach the final values much quicker. But in many applications the linear approximation has proved to be quite good. 


Qualitative Response 
Regression Models 


In all the regression models that we have considered so far, we have implicitly assumed that the regressand, 
the dependent variable, or the response variable Y is quantitative, whereas the explanatory variables are either 
quantitative, qualitative (or dummy), or a mixture thereof. In fact, in Chapter 9, on dummy variables. we saw 
how the dummy regressors are introduced in a regression model and what role they play in specific situations. 

In this chapter we consider several models in which the regressand itself is qualitative in nature. Although 
increasingly used in various areas of social sciences and medical research, qualitative response regression 
models pose interesting estimation and interpretation challenges. In this chapter we only touch on some of 
the major themes in this area, leaving the details to more specialized books.! 


15.1 The Nature of Qualitative Response Models 


Suppose we want to study the labor force participation (LFP) decision of adult males. Since an adult is 
either in the labor force or not, LFP is a yes or no decision. Hence, the response variable, or regressand, can 
take only two values, say, | if the person is in the labor force and 0 if he or she is not. In other words, the 
regressand is a binary, or dichotomous, variable. Labor economics research suggests that the LFP decision 
is a function of the unemployment rate, average wage rate, education, family income, etc. 

As another example, consider U.S. presidential elections. Assume that there are two political parties, 
Democratic and Republican. The dependent variable here is vote choice between the two political parties. 
Suppose we let Y = 1, if the vote is for a Democratic candidate, and Y = 0, if the vote is for a Republican 
candidate. A considerable amount of research on this topic has been done by the economist Ray Fair of Yale 
University and several political scientists.” Some of the variables used in the vote choice are growth rate of 


‘At the introductory level, the reader may find the following sources very useful. Daniel A. Powers and Yu Xie, Statistical 
Methods for Categorical Data Analysis, Academic Press, 2000; John H. Aldrich and Forrest Nelson, Linear Probability, Logit, 
and Probit Models, Sage Publications, 1984; and Tim Futing Liao, Interpreting Probability Models: Logit, Probit and Other 
Generalized Linear Models, Sage Publications, 1994. For a very comprehensive review of the literature, see G. S. Maddala, 
Limited- Dependent and Qualitative Variables in Econometrics, Cambridge University Press, 1983. 

2See, for example, Ray Fair, “Econometrics and Presidential Elections,” Journal of Economic Perspective, Summer 1996, 


pp. 89-102, and Michael S. Lewis-Beck, Economics and Elections: The Major Western Democracies, University of Michigan 
Press, Ann Arbor, 1980. 
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GDP, unemployment and inflation rates, whether the candidate is running for reelection, etc. For the present 
purposes, the important thing to note is that the regressand is a qualitative variable. 

One can think of several other examples where the regressand is qualitative in nature. Thus, a family either 
owns a house or it does not, it has disability insurance or it does not, both husband and wife are in the labor 
force or only one spouse is. Similarly, a certain drug is effective in curing an illness or it is not. A firm decides 
to declare a stock dividend or not, a senator decides to vote for a tax cut or not, a U.S. president decides to 
veto a bill or accept it, etc. 

We do not have to restrict our response variable to yes/no or dichotomous categories only, Returning to our 
presidential elections example, suppose there are three parties, Democratic, Republican, and Independent. The 
response variable here is trichotomous. In general. we can have a polychotomous (or multiple-category) 
response variable. 

What we plan to do is to first consider the dichotomous regressand and then consider various extensions of 
the basic model. But before we do that, it is important to note a fundamental difference between a regression 
model where the regressand Y is quantitative and a model where it is qualitative. 

In a model where Y is quantitative, our objective is to estimate its expected, or mean, value given the values 
of the regressors. In terms of Chapter 2, what we want is E(Y,IX,,, X3;,.... Xy). where the X°s are regressors, 
both quantitative and qualitative. In models where Y is qualitative, our objective is to find the probability of 
something happening, such as voting for a Democratic candidate, or owning a house, or belonging to a union, 
or participating in a sport, etc. Hence, qualitative response regression models are often known as probability 
models. 

In the rest of this chapter, we seek answers to the following questions: 


1. How do we estimate qualitative response regression models? Can we simply estimate them with the 
usual OLS procedures? 

2. Are there special inference problems? In other words, is the hypothesis testing procedure any different 
from the ones we have learned so far? 

3. If a regressand is qualitative, how can we measure the goodness of fit of such models? Is the conven- 
tionally computed R? of any value in such models? 

4. Once we go beyond the dichotomous regressand case, how do we estimate and interpret the polychot- 
omous regression models? Also, how do we handle models in which the regressand is ordinal, that is. 
an ordered categorical variable, such as schooling (less than 8 years, 8 to 11 years, 12 years, and 13 or 
more years), or the regressand is nominal where there is no inherent ordering, such as ethnicity (Black, 
White, Hispanic, Asian, and other)? 

5. How do we model phenomena such as the number of visits to one’s physician per year. the number of 
patents received by a firm in a given year, the number of articles published by a college professor in 
a year, the number of telephone calls received in a span of 5 minutes, or the number of cars passing 
through a toll booth in a span of 5 minutes? Such phenomena, called count data, or rare event data. 
are an example of the Poisson (probability) process. 


In this chapter we provide answers to some of these questions at the elementary level, for some of the 
topics are quite advanced and require more background in mathematics and statistics than assumed in this 
book. References cited in the various footnotes may be consulted for further details. 

We start our study of qualitative response models by first considering the binary response regression 
model. There are four approaches to developing a probability model for a binary response variable: 


1. The linear probability model (LPM) 
2. The logit model 

3. The probit model 

4. The tobit model 
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Because of its comparative simplicity, and because it can be estimated by ordinary least squares (OLS), we 
will first consider the LPM, leaving the other two models for subsequent sections. 


15.2 The Linear Probability Model (LPM) 


To fix ideas, consider the following regression model: 


Y; = Bi + BX; + ui (15.2.1) 
where X = family income and Y = | if the family owns a house and 0 if it does not own a house. 

Model (15.2.1) looks like a typical linear regression model but because the regressand is binary, or dichot- 
omous, it is called a linear probability model (LPM). This is because the conditional expectation of Y, given 
X,, E(Y,| X;), can be interpreted as the conditional probability that the event will occur given X; that is, Pr (Y; 
= 1 | X). Thus, in our example, E(Y;| X) gives the probability of a family owning a house and whose income 
is the given amount X, - ; 

The justification of the name LPM for models like Eq. (15.2.1) can be seen as follows: Assuming E(u;) = 
0, as usual (to obtain unbiased estimators), we obtain 


E(Y; | Xi) = Bi + BX; (15.2.2) 


Now, if P;= probability that Y,= 1 (that is, the event occurs), and (1 — P,) = probability that Y, = 0 (that is, 
the event does not occur), the variable Y, has the following (probability) distribution: 


Y; Probability 
0 1— P; 

1 Pi 
Total 1 


That is, Y, follows the Bernoulli probability distribution. 
Now, by the definition of mathematical expectation, we obtain: 


E(¥j) = 001 — P) + (A) =P; - (15.2.3) 
Comparing Eq. (15.2.2) with Eq. (15.2.3), we can equate {v 
E(Y; | Xi) = Bi + B2Xi = P; (15.2.4) 


that is, the conditional expectation of the model (15.2.1) can, in fact, be interpreted as the conditional proba- 
bility of Y, . In general, the expectation of a Bernoulli random variable is the probability that the random 
variable equals 1. In passing note that if there are n independent trials, each with a probability p of success 
and probability (1 — p) of failure, and X of these trials represent the number of successes, then X is said to 
follow the binomial distribution. The mean of the binomial distribution is np and its variance is np(1 — p). 
The term success is defined in the context of the problem. 

Since the probability P; must lie between 0 and 1, we have the restriction 


0 < E(Y;|X;) <1 (1512.5) 
that is, the conditional expectation (or conditional probability) must lie between 0 and 1. 
From the preceding discussion it would seem that OLS can be easily extended to binary dependent variable 


regression models. So, perhaps there is nothing new here. Unfortunately, this is not the case, for the LPM 
poses several problems, which are as follows: 
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Non-Normality of the Disturbances u, 


Although OLS does not require the disturbances (u;) to be normally distributed, we assumed them to be so 
distributed for the purpose of statistical inference.’ But the assumption of normality for u; is not tenable for 
the LPMs because, like Y, the disturbances u; also take only two values; that is, they also follow the Bernoulli 
distribution. This can be seen clearly if we write Eq. (15.2.1) as 


u; = Y; — Bi — poX; (15.2.6) 
The probability distribution of u; is 
u; Probability 
When Y; = 1 1— Bi — 82 Xi Pi (15.2.7) 


When Y; = 0 —ßı — f2 Xi (1 — Pj) 


Obviously, u; cannot be assumed to be normally distributed; they follow the Bernoulli distribution. 

But the nonfulfillment of the normality assumption may not be so critical as it appears because we 
know that the OLS point estimates still remain unbiased (recall that, if the objective is point estimation, the 
normality assumption is not necessary). Besides, as the sample size increases indefinitely, statistical theory 
shows that the OLS estimators tend to be normally distributed generally.’ As a result, in large samples the 
statistical inference of the LPM will follow the usual OLS procedure under the normality assumption. 


Heteroscedastic Variances of the Disturbances 


Even if E(u;) = 0 and cov (u;, uj) = 0 for i # j (1.e., no serial correlation), it can no longer be maintained that 
in the LPM the disturbances are homoscedastic. This is, however, not surprising. As statistical theory shows, 
for a Bernoulli distribution the theoretical mean and variance are, respectively, p and p(1 — p), where p is the 
probability of success (i.e., something happening), showing that the variance is a function of the mean. Hence 
the error variance is heteroscedastic. 

For the distribution of the error term given in Eq. (15.2.7), applying the definition of variance, the reader 
should verify that (see Exercise 15.10) 


var (u;) = P;(1 — P;) (15.2.8) 


That is, the variance of the error term in the LPM is heteroscedastic. Since P; = E(Y; | X) = B, + BX; the 
variance of u; ultimately depends on the values of X and hence is not homoscedastic. 

We already know that, in the presence of heteroscedasticity, the OLS estimators, although unbiased, are 
not efficient; that is, they do not have minimum variance. But the problem of heteroscedasticity, like the 
problem of non-normality, is not insurmountable. In Chapter 11 we discussed several methods of handling the 
heteroscedasticity problem. Since the variance of u; depends on E(Y;| X;), one way to resolve the heterosce- 
dasticity problem is to transform the model (15.2.1) by dividing it through by 


VEY |X) — EX) = VPC — Pi) = say ywi 


3Recall that we have recommended that the normality assumption be checked in an application by suitable normality tests, 
such as the Jarque—Bera test. 

‘The proof is based on the central limit theorem and may be found in E. Malinvaud, Statistical Methods of Econometrics, 
Rand McNally, Chicago, 1966, pp. 195-197. If the regressors are deemed stochastic and are jointly normally distributed, 
the F and t tests can still be used even though the disturbances are non-normal. Also keep in mind that as the sample size 
increases indefinitely, the binomial distribution converges to the normal distribution. 
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that is, 
Y; Ui 
ance A =i gee Wa 
As you can readily verify, the transformed error term in k. (15.2.9) is homoscedastic. Therefore, after 
estimating Eq. (15.2.1), we can now estimate Eq. (15.2.9) by OLS, which is nothing but the weighted least 
squares (WLS) with w; serving as the weights. 

In theory, what we have just described is fine. But in practice the true E(Y,! X;) = unknown; hence the 
weights w; are unknown. To estimate w,, we can use the following two-step procetltue® 


(15.2.9) 


Step 1. Run the OLS regression (15. 2.1) despite the heteroscedasticity problem and obtain Ý, = estimate 
of the true E(Y,| X;). Then obtain w; = =f, (1 — Ĉ;), the estimate of w;. 


Step 2. Use the estimated w, to transform the data as shown in Eq. (15.2.9) and estimate the transformed 
equation by OLS (i.e., weighted least squares). 


Although we will illustrate this procedure for our example shortly. it may be noted that we can use White's 
heteroscedasticity-corrected standard errors to deal with heteroscedasticity, provided the sample is reasonably 
large. 

Even if we correct for heteroscedasticity, we first need to address another problem that plagues LPM. 


Nonfulfillment of 0 = E(Y,| X;) = I 


Since E( Y,|.X;) in the linear probability models measures the conditional probability of the event Y occurring 
given X, it must necessarily lie between 0 and 1. Although this is true a priori, there is no guarantee that Ke 
the estimators of E(Y,| X;), will necessarily fulfill this restriction, and this is the real problem with the OLS 
estimation of the LPM. This happens because OLS does not take into account the restriction that 0 = E(Y,) = 
1 (an inequality restriction). There are two ways of finding out whether the estimated Ý, lie between 0 and 1. 
One is to estimate the LPM by the usual OLS method and find out whether the estimated Y; lie between 0 and 
1. If some are less than 0 (that is, negative), Y; is assumed to be zero for those cases: if they are greater than 1. 
they are assumed to be |. The second procedure is to devise an estimating technique that will guarantee that 
the estimated conditional probabilities Y; will lie between 0 and 1. The logit and probit models discussed later 
will guarantee that the estimated probabilities will indeed lie between the logical limits O and 1. 


v 


Questionable Value of R? as a Measure of Goodness of Fit 


The conventionally computed R? is of limited value in the dichotomous response models. To see why, consider 
Figure 15.1. Corresponding to a given X, Y is either 0 or 1. Therefore, all the Y values will either lie along the 
X axis or along the line corresponding to 1. Therefore, generally no LPM is expected to fit such a scatter well, 
whether it is the unconstrained LPM (Figure 15.!a) or the truncated or constrained LPM (Figure 15.1b), an 
LPM estimated in such a way that it will not fall outside the logical band 0-1. As a result, the conventionally 
computed R? is likely to be much lower than | for such models. In most practical applications the R? ranges 
between 0.2 to 0.6. R* in such models will be high, say, in excess of 0.8 only when the actual scatter is very 
closely clustered around points A and B (Figure 15.1c), for in that case it is easy to fix the straight line by 
joining the two points A and B. In this case the predicted Y; will be very close to either 0 or 1. 


*For the justification of this procedure, see Arthur S. Goldberger, Econometric Theory, John Wiley & Sons, New York, 1964, 
pp. 249-250. The justification is basically a large-sample one that we discussed under the topic of feasible or estimated 
generalized least squares in the chapter on heteroscedasticity (see Sec. 11.6). 
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y LPM (unconstrained) 


7 
A 


LPM (constrained) 


(c) 


Figure 15.1 Linear probability models. 


For these reasons John Aldrich and Forrest Nelson contend that “use of the coefficient of determination as 
a summary statistic should be avoided in models with qualitative dependent variable(s]."° 


Example 15.1 LPM: A Numerical Example 


To illustrate some of the points made about the LPM in this section, we present a numerical example. Table 
15.1 gives invented data on home ownership Y(1 = owns a house, 0 = does not own a house) and family 
income X (thousands of dollars) for 40 families. From these data the LPM estimated by OLS was as follows: 


6Aldrich and Nelson, op. cit., p. 15. For other measures of goodness of fit in models involving dummy regressands, see T. 
Amemiya, “Qualitative Response Models,” Journal of Economic Literature, vol. 19, 1981, pp. 331-354. 
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Table 15.1 Hypothetical Data on Home Ownership (Y= 1 If owns home, 0 Otherwise) and 
Income X (Thousands of dollars) 


Family Y X Family Y X 
1 0 8 21 1 22 
2 1 16 22 1 16 
3 1 18 23 0 12 
4 0 11 24 0 11 
5 0 12 25 1 16 
6 1 19 26 0 11 
7 1 200 27 1 20 
8 0 13 28 1 18 
9 0 9 29 0 11 
10 0 10 30 0 10 
11 1 17 31 1 17 
12 1 18 32 0 13 
13 0 14 33 1 21 
14 1 20 34 1 20 
15 0 6 35 0 11 
16 1 19 36 0 8 
17 1 16 37 1 17 
18 0 10 38 1 16 
19 0 8 39 0 7 
20 1 18 40 1 17 


A 


Y; = —0.9457 + 0.1021X; 
(0.1228) (0.0082) (15.2.10) 
t = (—7.6984) (12.515) R? = 0.8048 


First, let us interpret this regression. The intercept of -0.9457 gives the “probability” that a family with zero 
income will own a house. Since this value is negative, and since probability cannot be negative, we treat 
this value as zero, which is sensible in the present instance.” The slope value of 0.1021 means that for a unit 
change in income (here $1,000), on the average the probability of owning a house increases by 0.1021 or 
about 10 percent. Of course, given a particular level of income, we can estimate the actual probability of 
owning a house from Eq. (15.2.10). Thus, for X = 12 ($12,000), the estimated probability of owning a house is 


Yil X =12) = —0.9457 + 12(0.1021) 
= 0.2795 


That is, the probability that a family with an income of $12,000 will own a house is about 28 percent. Table 
15.2 shows the estimated probabilities, Y;, for the various income levels listed in the table. The most noticeable 
feature of this table is that six estimated values are negative and six values are in excess of 1, demonstrating 
clearly the point made earlier that, although E (Y;| X;) is positive and less than 1, their estimators, Y;, need not 
be necessarily positive or less than 1. This is one reason that the LPM is not the recommended model when 
the dependent variable is dichotomous. 


7One can loosely interpret the highly negative value as near improbability of owning a house when income is zero. 
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Even if the estimated Y, were all positive and less than 1, the LPM still suffers from the problem of 
heteroscedasticity, which can be seen readily from Eq. (15.2.8). As a consequence, we cannot trust the 
estimated standard errors reported in Eq. (15.2.10). (Why?) But we can use the weighted least-squares (WLS) 
procedure discussed earlier to obtain more efficient estimates of the standard errors. The necessary weights, 
w,, required for the application of WLS are also shown in Table 15.2. But note that since some Y,are negative 
and some are in excess of one, the w; corresponding to these values will be negative. Thus, we cannot use 
these observations in WLS (why?), thereby reducing the number of observations, from 40 to 28 in the present 
example.® Omitting these observations, the WLS regression is 


af = e a -+ 0.1196-ŽŁ 
vw Vii VW; : 
(0.1206) (0.0069) Ca T) 
t = (—10.332) (17.454) R? = 0.9214 

Table 15.2 Actual Y, Estimated Y, and Weights w, for the Home Ownership Example 
Y; Y; w? JW; Y; Ý; w;* W; 
0 —0.129* 1 1.3017 
1 0.688 0.2146 0.4633 1 0.688 0.2147 0.4633 
1 0.893 0.0956 0.3091 0 0.280 . 0.2016 0.4990 
0 0.178 0.1463 0.3825 0 0.178 0.1463 ` 0.3825 
0 0.280 0.2016 0.4490 1 0.688 0.2147 0.4633 
1 0.995 0.00498 0.0705 0 0.178 0.1463 0.3825 
1 1.098" 1 1.097% 
0 0.382 0.2361 0.4859 1 0.893 0.0956 0.3091 
0 —0.0265* s 0 0.178 0.1463 0.3825 
0 0.076 0.0702 0.2650 0 0.076 0.0702 0.2650 
1 0.791 0.1653 0.4066 1 0.791 0.1653 0.4055 
1 0.893 0.0956 0.3091 0 0.382 0.2361 0.4859 
0 0.484 0.2497 0.4997 1 1.1997 
1 cael O07" 1 10971 
0 —0.333* 0 0.178 0.1463 0.3825 
1 0.995 0.00498 0.0705 0 —0.129* 
1 0.688 0.2147 0.4633 1 0.791 0.1653 0.4066 
0 0.076 0.0702 0.2650 1 0.688 0.2147 0.4633 
0 —0.129* 0 —0.231* 
1 0.893 0.0956 0.3091 1 


0.791 0.1653 0.4066 


“Treated as zero to avoid probabilities being negative. 
' Treated as unity to avoid probabilities exceeding one. 
ty — Yi) 


These results show that, compared with Eq. (15.2.10), the estimated standard errors are smaller and, corre- 
spondingly, the estimated t ratios (in absolute value) are larger. But one should take this result with a grain of 
salt since in estimating Eq. (15.2.11) we had to drop 12 observations. Also, since w; are estimated, the usual 
statistical hypothesis-testing procedures are, strictly speaking, valid in the large samples (see Chapter 11). 


8To avoid the loss of the degrees of freedom, we could let Y; = 0.01 when the estimated Y; are negative and Y, = 0.99 when 
they are in excess of or equal to 1. See Exercise 15.1. 
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15.3 Applications of LPM 


Until the availability of readily accessible computer packages to estimate the logit and probit models (to be 
discussed shortly), the LPM was used quite extensively because of its simplicity. We now illustrate some of 
these applications. 


Example 15.2 Cohen-Rea- Lerman Study” 


In a study prepared for the U.S. Department of Labor, Cohen, Rea, and Lerman were interested in examining 
the labor-force participation of various categories of labor as a function of several socioeconomic-demographic 
variables. In all their regressions, the dependent variable is a dummy, taking a value of 1 if a person is in the 
labor force, 0 if he or she is not. In Table 15.3 we reproduce one of their several dummy-dependent variable 
regressions. 

Before interpreting the results, note these features: The preceding regression was estimated by using the 
OLS. To correct for heteroscedasticity, the authors used the two-step procedure outlined previously in some 
of their regressions but found that the standard errors of the estimates thus obtained did not differ materially 
from those obtained without correction for heteroscedasticity. Perhaps this result is due to the sheer size of 
the sample, namely, about 25,000. Because of this large sample size, the estimated t values may be tested 
for statistical significance by the usual OLS procedure even though the error term takes dichotomous values. 
The estimated R° of 0.1 75 may seem rather low, but in view of the large sample size, this R? is still significant 
on the basis of the F test (See Section 8.4). Finally, notice how the authors have blended quantitative and 
qualitative variables and how they have taken into account the interaction effects. 

Turning to the interpretations of the findings, we see that each slope coefficient gives the rate of change 
in the conditional probability of the event occurring for a given unit change in the value of the explanatory 
variable. For instance, the coefficient of -0.2753 attached to the variable “age 65 and over” means, holding 
all other factors constant, the probability of participation in the labor force by women in this age group is 
smaller by about 27 percent (as compared with the base category of women aged 22 to 54). By the same 
token, the coefficient of 0.3061 attached to the variable “16 or more years of schooling” means, holding all 
other factors constant, the probability of women with this much education participating in the labor force is 
higher by about 31 percent (as compared with women with less than 5 years of schooling, the base category). 

Now consider the interaction term marital status and age. The table shows that the labor-force partici- 
pation probability is higher by some 29 percent for those women who were never married (as compared with 
the base category) and smaller by about 28 percent for those women who are 65 and over (again in relation 
to the base category). But the probability of participation of women who were never margied and are 65 or 
over is smaller by about 20 percent as compared with the base category. This implies that women aged 65 
and over but never married are likely to participate in the labor force more than those who are aged 65 and 
over and are married or fall into the “other” category. 

Following this procedure, the reader can easily interpret the rest of the coefficients given in Table 15.3. From 
the given information, it is easy to obtain the estimates of the conditional probabilities of labor-force partici- 
pation of the various categories. Thus, if we want to find the probability for married women (other), aged 
22 to 54, with 12 to 15 years of schooling, with an unemployment rate of 2.5 to 3.4 percent, employment 
change of 3.5 to 6.49 percent, relative employment opportunities of 74 percent and over, and with FILOW 
of $7,500 and over, we obtain 


0.4368 + 0.1523 + 0.2231 — 0.0213 + 0.0301 + 0.0571 — 0.2455 = 0.6326 
In other words, the probability of labor-force participation by women with the preceding characteristics is 
estimated to be about 63 percent. 


?Malcolm S. Cohen, Samuel A. Rea, Jr., and Robert |. Lerman, A Micro Model of Labor Supply, BLS Staff Paper 4, U.S. Depart- 
ment of Labor, 1970. 


Table 15.3 Labor-Force Participation 
Regression of women, age 22 and over, living in largest 96 standard metropolitan 
Statistical areas (SMSA) (dependent variable: in or out of labor force during 1966) 


Explanatory Variable 
Constant 


Marital status 
Married, spouse present 
Married, other 
Never married 
Age 
22-54 
55-64 
65 and over 


Years of schooling 
0-4 
5-8 
9-11 
12-15 
16 and over 


Unemployment rate (1966), % 
Under 2.5 
2.5-3.4 
3.5-4.0 
4.1-5.0 
5.1 and over 


Employment change (1965-1966), % 
Under 3.5 
3.5-6.49 
6.5 and over 


Relative employment opportunities, % 
Under 62 
62-73.9 
74 and over 

FILOW, $ 
Less than 1,500 and negative 
1,500-7,499 


7,500 and over 
Interaction (marital status and age) 
Marital status Age 
Other 55-64 
Other 65 and over 
Never married 55-64 
Never married 65 and over 
Interaction (age and years of schooling completed) 
Age Years of schooling 
65 and over 5-8 
65 and over 9-11 
65 and over 12-15 
65 and over 16 and over 


No. of observations = 25,153 


R2=0.175 


Coefficient 
0.4368 
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t Ratio 
15.4 


Note: — indicates the base or omitted category. 


FILOW: family income less own wage and salary income. 


Source: Malcolm S. Cohen, Samuel A. Rea, Jr., and Robert I. Lerman, A Micro Model of Labor Supply, BLS Staff Paper 4, 
U.S. Department of Labor, 1970, Table F-6, pp. 212-213. 
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Example 15.3 Predicting a Bond Rating 


Based on a pooled time series and cross-sectional data of 200 Aa (high-quality) and Baa (medium-quality) 
bonds over the period 1961-1966, Joseph Cappelleri estimated the following bond rating prediction model. 


Y; = Bi + BoX3, + B3 X3i + BaXai + BSX5i + Ui 


where Y; = 1 if the bond rating is Aa (Moody’s rating) 

= 0 if the bond rating is Baa (Moody's rating) 

X2 = debt capitalization ratio, a measure of leverage 
dollar value of long-term debt 

= dollar value of total capitalization 

X3 = profit rate 
dollar value of after-tax income 

= “dollar value of net total assets — 
X4 = Standard deviation of the profit rate, a measure of profit rate variability 
Xs = net total assets (thousands of dollars), a measure of size 


A priori, B, and 8, are expected to be negative (why?) and £; and 8; are expected to be positive. 
After correcting for heteroscedasticity and first-order autocorrelation, Cappelleri obtained the following 
results:"! 


¥; = 0.6860 — 0.0179X2, + 0.0486X3; + 0.0572X4; + 0.378(E-7)Xs5 
(0.1775) (0.0024) (0.0486) (0.0178) (0.039)(E-8) (15.3.1) 


R2 = 016933 
Note: 0.378 (E-7) means 0.0000000378, etc. 

All but the coefficient of X, have the correct signs. It is left to finance students to rationalize why the profit 
rate variability coefficient has a positive sign, for one would expect that the greater the variability in profits, 
the less likely it is Moody’s would give an Aa rating, other things remaining the same. 

The interpretation of the regression is straightforward. For example, 0.0486 attached to X; means that, 
other things being the same, a 1 percentage point increase in the profit rate will lead on average to about a 
0.05 increase in the probability of a bond getting the Aa rating. Similarly, the higher the squared leveraged 
ratio, the lower by 0.02 is the probability of a bond being classified as an Aa bond per unit increase in this 
ratio. 


Example 15.4 Who Holds a Debit Card? 


Like credit cards, debit cards are now used extensively by consumers. Vendors prefer them because when 
you use a debit card, the amount of your purchase is automatically deducted from your checking or other 
designated account. To find out what factors determine the use of the debit card, we obtained data on 60 
customers and considered the following model:'2 


Yi = By + B2X2j + b3 X3i + P4 X4i + Ui 


10Joseph Cappelleri, “Predicting a Bond Rating,” unpublished term paper, C.U.N.Y. The model used in the paper is a modi- 
fication of the model used by Thomas F. Pogue and Robert M. Soldofsky, “What Is in a Bond Rating?” Journal of Financial 
and Quantitative Analysis, june 1969, pp. 201-228. 

"Some of the estimated probabilities before correcting for heteroscedasticity were negative and some were in excess of 1; 
in these cases they were assumed to be 0.01 and 0.99, respectively, to facilitate the computation of the weights w,. 

12The data used in the analysis are obtained from Douglas A. Lind, William G. Marchal, and Robert D. Mason, Statistical 


Techniques in Business and Economics, 11th Ed., McGraw-Hill, 2002, Appendix N, pp. 775-776. We have not used all the 
variables used by the authors, 
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where Y = 1 for debit card holder, 0 otherwise; X= account balance in dollars; X, = number of ATM transac- 
tions; X,= 1 if interest is received on the account, 0 otherwise. 

Since the linear probability model (LPM) exhibits heteroscedasticity, we present the usual OLS results and 
the OLS results corrected for heteroscedasticity in a tabular form. 


Variable Coefficient Coefficient* 
Constant 0.3631 0.3631 
(0.1796)** (0.1604)** 
Balance 0.00028** 0.00028** 
(0.00015) (0.00014) 
ATM —0.0269 ` —0.0269 
(0.208) (0.0202) 
Interest —0.3019** —0.3019** 
(0.1448) (0.1353) 
R2 0.1056 (0.1056) 


Note: *denotes heteroscedasticity-corrected standard errors. 
**significant at about 5% level. 


As these results show, those who have higher account balances will tend to hold a debit card. The higher the 
interest rate paid on account balances, the less the tendency to hold a debit card. Although the ATM variable 
is not significant, note that it has a negative sign. This is perhaps due to ATM transaction fees. 

There is not a vast difference between the estimated standard errors with and without heteroscedasticity 
correction. To save space, we have not presented the fitted values (i.e., the estimated probabilities), but they 
all were within the limits of O and 1. However, there is no guarantee that this will happen in every case. 


15.4 Alternatives to LPM 


As we have seen, the LPM is plagued by several problems, such as (1) non-normality of u,. (2) heterosce- 
dasticity of u,, (3) possibility of Y, lying outside the 0-1 range, and (4) the generally lower R? values. But 
these problems are surmountable. For example, we can use WLS to resolve the heteroscedasticity problem or 
increase the sample size to minimize the non-normality problem. By resorting to restricted least-squares or 
mathematical programming techniques we can even make the estimated probabilities lie in the 0—1 interval. 

But even then the fundamental problem with the LPM is that it is not logically a very attractive model 
because it assumes that P;= E(Y = 1 | X) increases linearly with X, that is, the marginal or incremental effect 
of X remains constant throughout. Thus, in our home ownership example we found that as X increases by a 
unit ($1,000), the probability of owning a house increases by the same constant amount of 0.10. This is so 
whether the income level is $8,000, $10,000, $18,000, or $22,000. This seems patently unrealistic. In reality 
one would expect that P;is nonlinearly related to X;: At very low income a family will not own a house but at 
a sufficiently high level of income, say, X‘, it most likely will own a house. Any increase in income beyond 
X will have little effect on the probability of owning a house. Thus, at both ends of the income distribution, 
the probability of owning a house will be virtually unaffected by a small increase in X. 

Therefore, what we need is a (probability) model that has these two features: (1) As X; increases, P;= E(Y = 
1 | X) increases but never steps outside the 0-1 interval, and (2) the relationship between P; and X; is nonlinear, 
that is, “one which approaches zero at slower and slower rates as X; gets small and approaches one at slower 
and slower rates as X, gets very large.”!* 


13John Aldrich and Forrest Nelson, op. cit., p. 26. 
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Figure 15.2 A cumulative distribution function (CDF). 


Geometrically, the model we want would look something like Figure 15.2. Notice in this model that the 
probability lies between 0 and 1 and that it varies nonlinearly with X. 

The reader will realize that the sigmoid, or S-shaped, curve in the figure very much resembles the 
cumulative distribution function (CDF) of a random variable.'* Therefore, one can easily use the CDF 
to model regressions where the response variable is dichotomous, taking 0-1 values. The practical question 
now is, which CDF? For although all CDFs are S shaped, for each random variable there is a unique CDF. 
For historical] as well as practical reasons, the CDFs commonly chosen to represent the 0-1 response models 
are (1) the logistic and (2) the normal, the former giving rise to the logit model and the latter to the probit 
(or normit) model. 

Although a detailed discussion of the logit and probit models is beyond the scope of this book, we will 
indicate somewhat informally how one estimates such models and how one interprets them. 


15.5 The Logit Model 


We will continue with our home ownership example to explain the basic ideas underlying the logit model. 
Recall that in explaining home ownership in relation to income, the LPM was 


P; = Bi + BoX; (15.5.1) 


where X is income and P; = E(¥; = | IX) means the family owns a house. But now consider the following 
representation of home ownership: 


P= l 
i= IF eoh (15.5.2) 
For ease of exposition, we write Eq. (15.5.2) as 
1 e? 


——————————— 
' 1+e2 ~ 1462 


(15.5.3) 
where Z; = B,+ BX; 


"As discussed in Appendix A, the CDF of a random variable X is simply the probability that it takes a value less than or 
equal to Xo, where Xp is some specified numerical value of X. In short, A(X), the CDF of X, is KX = Xo)= P(X = Xp). 
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Equation (15.5.3) represents what is known as the (cumulative) logistic distribution function.’ 

It is easy to verify that as Z; ranges from =% to +o, P, ranges between 0) and | and that P, is nonlinearly 
related to Z; (i.e., X;), thus satisfying the two requirements considered earlier. But it seems that in satisfying 
these requirements, we have created an estimation problem because P, is nonlinear not only in X but also in 
the B's as can be seen clearly from Eq. (15.5.2). This means that we cannot use the familiar OLS procedure to 
estimate the parameters.'’ But this problem is more apparent than real because Eq. (15.5.2) can be linearized, 
which can be shown as follows. 

If P;, the probability of owning a house, is given by Eg. (15.5.3), then (1 — P,), the probability of not 
owning a house, is 


l] — P, = (15.5.4) 
Therefore, we can write 


P; l+e” Zz; 
me = ees =e (15.5.5) 
Now P;/(1 — P;) is simply the odds ratio in favor of owning a house—the ratio of the probability that a family 
will own a house to the probability that it will not own a house. Thus, if P, = 0.8, it means that odds are 4 to 
1 in favor of the family owning a house. 
Now if we take the natural log of Eq. (15.5.5), we obtain a very interesting result, namely, 


L =n (5) = 7; 
a (15.5.6) 


= By + BX; 


that is, L, the log of the odds ratio, is not only linear in X, but also (from the estimation viewpoint) linear in 
the parameters.!® L is called the logit, and hence the name logit model for models like Eandis) 
Notice these features of the logit model. l 


1. As P goes from 0 to 1 (i.e., as Z varies from —< to +2), the logit L goes from —œ to +20. That is, although 
the probabilities (of necessity) lie between 0 and 1, the logits are not so bounded. 

2. Although L is linear in X, the probabilities themselves are not. This property is in contrast with the LPM 
model (15.5.1) where the probabilities increase linearly with X ld 

3. Although we have included only a single X variable, or regressor, in the preceding model, one can add 
as many regressors as may be dictated by the underlying theory. 


15The logistic model has been used extensively in analyzing growth phenomena, such as population, GNP, money supply, 
etc. For theoretical and practical details of logit and probit models, see J. S. Kramer, The Logit Model for Economists, Edward 
Arnold Publishers, London, 1991; and G. S. Maddala, op. cit. 

Note that as Z; + +00, e72 tends to zero and as Z, > —oo, e~4/ increases indefinitely. Recall that e = 2.71828. 

170f course, one could use nonlinear estimation techniques discussed in Chapter 14. See also Section 15.8. 

'8Recall that the linearity assumption of OLS does not require that the X variable be necessarily linear. So we can have X?, 
X?, etc., as regressors in the model. For our purpose, it is linearity in the parameters that is crucial. 

Using calculus, it can be shown that dP/dX = B, P(1 - P), which shows that the rate of change in probability with respect 
to X involves not only £, but also the level of probability from which the change is measured (but more on this in Section 
15.7). In passing, note that the effect of a unit change in X; on P is greatest when P = 0.5 and least when P is close to 0 or 1. 
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4. If L, the logit, is positive, it means that when the value of the regressor(s) increases, the odds that the 
regressand equals | (meaning some event of interest happens) increases. If L is negative, the odds that the 
regressand equals | decreases as the value of X increases. To put it differently, the logit becomes negative and 
increasingly large in magnitude as the odds ratio decreases from 1 to 0 and becomes increasingly large and 
positive as the odds ratio increases from 1 to infinity.”° 

5. More formally, the interpretation of the logit model given in Eq. (15.5.6) is as follows: 8, the slope, 
measures the change in L for a unit change in X, that is, it tells how the log-odds in favor of owning a house 
change as income changes by a unit, say, $1,000. The intercept B, is the value of the log-odds in favor of 
owning a house if income is zero. Like most interpretations of intercepts, this interpretation may not have 
any physical meaning. 

6. Given a certain level of income, say, X’, if we actually want to estimate not the odds in favor of owning 
a house but the probability of owning a house itself, this can be done directly from Eq. (15.5.3) once the 
estimates of 8, and £, are available. This, however, raises the most important question: How do we estimate 
B and f, in the first place? The answer is given in the next section. 

7. Whereas the LPM assumes that P; is linearly related to X, the logit model assumes that the log of the 
odds ratio is linearly related to X, 


15.6 Estimation of the Logit Model 


For estimation purposes, we write Eq. (15.5.6) as follows: 


P; 
L; =n (5) = By + BoX; + ui (15.6.1) 
We will discuss the properties of the stochastic error term u; shortly. 
To estimate Eq. (15.6.1), we need, apart from X,, the values of the regressand. or logit. L,. This depends on 
the type of data we have for analysis. We distinguish two types of data: (1) data at the individual, or micro. 
level, and (2) grouped or replicated data. 


Data at the Individual Level 


If we have data on individual families, as in the case of Table 15.1, OLS estimation of Eq. (15.6.1) is 
infeasible. This is easy to see. In terms of the data given in Table 15.1, P,= 1 if a family owns a house and 
P;= 0 if it does not own a house. But if we put these values directly into the logit L; we obtain: 


1 
IE = lia (5) if a family own a house 


0 : : 
Lacin ( 7) _ ifa family does not own a house . 
Obviously, these expressions are meaningless. Therefore, if we have data at the micro, or individual, level, 
we cannot estimate Eq. (15.6.1) by the standard OLS routine. In this situation we may have to resort to the 
maximum-likelihood (ML) method to estimate the parameters. Although the rudiments of this method were 
discussed in the appendix to Chapter 4, its application in the present context will be discussed in Appendix 


2°This point is due to David Carson. 
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15A, Section 15A.1, for the benefit of readers who would like to learn more about it.2! Software packages, 
such as MICROFIT, EViews, LIMDEP, SHAZAM, PC-GIVE, STATA, and MINITAB, have built-in routines 


to estimate the logit model at the individual level. We will illustrate the use of the ML method later in the 
chapter. 


Grouped or Replicated Data 


Now consider the data given in Table 15.4. This table gives data on several families grouped or replicated 
(repeat observations) according to income level and the number of families owning a house at each income 
level. Corresponding to each income level X, there are N, families, n; among whom are home owners 
(n; = N;). Therefore, if we compute 

A Nn; 

P; = — 15.6.2 

ay ( ) 

that is, the relative frequency, we can use it as an estimate of the true P; corresponding to each X, If N, is 
fairly large, P, will be a reasonably good estimate of P,.*“ Using the estimated P,, we can obtain the estimated 
logit as 


; Ê D. 
Ê: =In (; 3 Z Dn (15.6.3) 


which will be a fairly good estimate of the true logit L; if the number of observations N, at each X; is reasonably 
large. 


Table 15.4 Hypothetical Data on X; (Income), N; (Number of 
Families at Income X), and 2;(Number of Families 


Owning a House) 
X 
(thousands of dollars) N; ni 
6 40 8 
8 50 12 
10 60 18 
13 80 28 
15 100 45 
20 70 36 
25 65 39 
30 50 33 
35 40 30 


21For a comparatively simple discussion of maximum likelihood in the context of the logit model, see John Aldrich and For- 
rest Nelson, op. cit., pp. 49-54. See also, Alfred Demarsi, Logit Modeling: Practical Applications, Sage Publications, Newbury 
Park, Calif., 1992. 

22From elementary statistics recall that the probability of an event is the limit of the relative frequency as the sample size 
becomes infinitely large. 
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In short, given the grouped or replicated data, such as Table 15.4, one can obtain the data on the dependent 
variable, the logits, to estimate the model (15.6.1). Can we then apply OLS to Eg. (15.6.3) and estimate the 
parameters in the usual fashion? The answer is, not quite, since we have not yet said anything about the 
properties of the stochastic disturbance term. It can be shown that if N; is fairly large and if each observation 
in a given income class X; is distributed independently as a binomial variable, then 


1 
SNO 15.6.4 
k | ERa a- nail 
that is, u; follows the normal distribution with zero mean and variance equal to IAN; P(A - Bi 


Therefore, as in the case of the LPM, the disturbance term in the logit model is heteroscedastic. Thus, 
instead of using OLS we will have to use the weighted least squares (WLS). For empirical purposes, however, 
we will replace the unknown P; by P; and use 


Po 1 


CC = NB — B) (15.6.5) 
as estimator of o°. 


We now describe the various steps in estimating the logit regression in Eq. (15.6.1): 


1. For each income level X, compute the probability of owning a house as Ê =n; y Ni 
2. For each X,, obtain the logit as” 


£; =n{hi (1 — Pd] 
3. To resolve the problem of heteroscedasticity, transform Eq. (15.6.1) as follows: 


SMWiLi = pii + po Xi + Wi (15.6.6) 


which we write as 
METANET AE n (15.6.7) 


where the weights w; = N; Pl — Ê); L* = transformed or weighted L;: X* = transformed or 
weighted X;; and v; = transformed error term. It is easy to verify that the transformed error term v; is 
homoscedastic, keeping in mind that the original error variance is of = 1/[N; P,(1 — P;)]. 

4. Estimate Eq. (15.6.6) by OLS—recall that WLS is OLS on the transformed data. Notice that in 
Eq. (15.6.6) there is no intercept term introduced explicitly (why?). Therefore, one will have to use the 
regression through the origin routine to estimate Eq. (15.6.6). 

5. Establish confidence intervals and/or test hypotheses in the usual OLS framework, but keep in mind 
that all the conclusions will be valid strictly speaking only if the sample is reasonably large (why?). 
Therefore, in small samples, the estimated results should be interpreted carefully. 


234s shown in elementary probability theory, P;, the proportion of successes (here, owning a house), follows the binomial 
distribution with mean equal to true P; and variance equal to P{1 — P, )/N; and as N; increases indefinitely the binomial 
distribution approximates the normal distribution. The distributional properties of u; given in Eq. (15.6.4) follow from 
this basic theory. For details, see Henry Theil, “On the Relationships Involving Qualitative Variables,” American Journal of 
Sociology, vol. 76, July 1970, pp. 103-154. 


4Since P; = n;/N;, Li, can be alternatively expressed as Ê; =Inn/(N — ni). In passing it should be noted that to avoid P; 
taking the value of 0 or 1, in practice Ê; is measured as Î; = In (n; + })/(N; — ni + 4) = In (Ô; + VAND P 2N: 
It is recommended as a rule of thumb that N; be at least 5 at each value of X;. For additional details, see D. R. Cox, Analysis 
of Binary Data, Methuen, London, 1970, p. 33. 


25if we estimate Eq. (15.6.1) disregarding heteroscedasticity, the estimators, although unbiased, will not be efficient, as we 
know from Chapter 11. 
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15.7. The Grouped Logit (Glogit) Model: A Numerical Example 


To illustrate the theory just discussed, we will use the data given in Table 15.4. Since the data in the table 
are grouped, the logit model based on this data will be called a grouped logit model, glogit, for short. The 
necessary raw data and other relevant calculations necessary to implement glogit are given in Table 15.5. 
The results of the weighted least-squares regression (15.6.7) based on the data given in Table 15.5 are as 
follows: Note that there is no intercept in Eq. (15.6.7); hence the regression-through-the-origin procedure is 
appropriate here. 


L*= ~1.59474,/w;+ 0.07862X* 
se= (0.11046) (0.00539) (15.7.1) 
t = (—14.43619) (14.56675) R? = 0.9642 


The R? is the squared correlation coefficient between actual and estimated L*. L* and X* are weighted L; 


and X,, as shown in Eq. (15.6.6). Although we have shown the calculations of the grouped logit in Table 15.5 
for pedagogical reasons, this can be done easily by invoking the glogit (grouped logit) command in STATA. 


Interpretation of the Estimated Logit Model 
How do we interpret Eq. (15.7.1)? There are various ways, some intuitive and some not: 


Logit Interpretation 


As Eq. (15.7.1) shows, the estimated slope coefficient suggests that for a unit ($1,000) increase in weighted 
income, the weighted log of the odds in favor of owning a house goes up by 0.08 units. This mechanical 
interpretation, however, is not very appealing. 


Odds Interpretation 


Remember that L; = In [P/(1 — P,)]. Therefore, taking the antilog of the estimated logit, we get P/(1 — P;), 
that is, the odds ratio. Hence, taking the antilog of Eq. (15.7.1), we obtain: 


Pi L 1.59474 /ivi+0.07862X? 
i 2; : (15.7.2) 


= e7 1.59474, /w, 1 e?-07862x7 


Using a calculator, you can easily verify that e®0862 — 1.0817. This means that for a unit increase in weighted 
income, the (weighted) odds in favor of owning a house increases by 1.0817 or about 8.17 percent. In general, 
if you take the antilog of the jth slope coefficient (in case there is more than one regressor in the model), 
subtract I from it, and multiply the result by 100, you will get the percent change in the odds for a unit 
increase in the jth regressor. 

Incidentally, if you want to carry the analysis in terms of unweighted logit, all you have to do is divide the 
estimated L* by ,/w;. Table 15.6 gives the estimated weighted and unweighted logits for each observation 
and some other data, which we will discuss shortly. 
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Computing Probabilities 


Since the language of logit and odds ratio may be unfamiliar to some, we can always compute the probability of 
owning a house at a certain level of income. Suppose we want to compute this probability at X = 20 ($20,000). 
Plugging this value into Eq. (15.7.1), we obtain: i* = = —0.09311 and dividing this by \/w; = 4.1816 (see 
Table 15.5), we obtain L, = —0.02226. Therefore, at the income level of $20,000, we have 


A 


002199 = in( ži 7 
rr 


Therefore, 
Ê 
—— =e — 0.97825 
a 
Solving this for 
. e70.02199 
a 1 + e-0.02199 


the reader can see that the estimated probability is 0.4945. That is, given the income of $20,000, the proba- 
bility of a family owning a house is about 49 percent. Table 15.6 shows the probabilities thus computed at 
various income levels. As this table shows, the probability of house ownership increases with income, but not 
linearly as with the LPM model. 


Computing the Rate of Change of Probability 


As you can gather from Table 15.6, the probability of owning a house depends on the income level. How can 
we compute the rate of change of probabilities as income varies? As noted in footnote 19, that depends not 
only on the estimated slope coefficient B, but also on the level of the probability from which the change is 
measured; the latter of course depends on the income level at which the probability is computed. 

To illustrate, suppose we want to measure the change in the probability of owning a house at the income 
level $20,000. Then, from footnote 19 the change in probability for a unit increase in income from the level 
20 (thousand) is: (1 — Ê) Ê = 0.07862(0.5056)(0.4944) = 0.01965. 

It is left as an exercise for the reader to show that at 0.020 
income level $40,000, the change in probability is 0.01 135. 
Table 15.6 shows the change in probability of owning a 
house at various income levels; these probabilities are also 0.018 


0.019 


P 
depicted in Figure 15.3. 2 0.017 
To conclude our discussion of the glogit model, 5 aie 
we present the results based on OLS, or unweighted 4 ` 
regression, for the home ownership example: 4 0.015 
ap 
= & 0.014 
i; = -1.6587 + 0.0792X; Ep 
> 0.013 
= i j (15.7.3) 
se (0.0958) (0.0041) R 
t = (—17.32) (19.11) r? = 0.9786 0.011 
? a "Ss 10° 15 20S WO 5 AO 45 
We leave it to the reader to compare this regression xX. 1fitUine thousmarordolikrs 


with the weighted least-squares regression given by Eq. Figure 15.3 Change in probability in relation to 
£15.7.1). , income. 
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15.8 The Logit Model for Ungrouped or Individual Data 


To set the stage, consider the data given in Table 15.7. Letting Y = 1 if a student’s final grade in an interme- 
diate microeconomics course was A and Y = 0 if the final grade was a B or a C, Spector and Mazzeo used 
grade point average (GPA), TUCE, and Personalized System of Instruction (PSI) as the grade predictors. The 
logit model here can be written as: 


L; — ih (z) = Bi + B2GPA; se P TUCE; + PaPSI + ui (15.8.1) 

As we noted in Section 15.6, we cannot simply put P; = 1 if a family owns a house, and zero if it does not 
own a house. Here neither OLS nor weighted least squares (WLS) is helpful. We have to resort to nonlinear 
estimating procedures using the method of maximum likelihood. The details of this method are given in 
Appendix 15A, Section 15A.1. Since most modern statistical packages have routines to estimate logit models 
on the basis of ungrouped data, we will present the results of model (15.8.1) using the data given in Table 15.7 
and show how to interpret the results. The results are given in Table 15.8 in tabular form and are obtained by 
using EViews 6. Before interpreting these results, some general observations are in order. 

1. Since we are using the method of maximum likelihood, which is generally a large-sample method, the 
estimated standard errors are asymptotic. 

2. As a result, instead of using the ż statistic to evaluate the statistical significance of a coefficient, we use 
the (standard normal) Z statistic. So inferences are based on the normal table. Recall that if the sample size is 
reasonably large, the t distribution converges to the normal distribution. 


Table 15.7 Data on the Effect of Personalized System of Instruction (PSI). on Course Grades 


GPA TUCE . Letter GPA TUCE : Letter 

Observation Grade Grade PSI Grade Grade Observation Grade Grade PSI Grade Grade 
1 2.66 20 0 0 C 17 2.75 25 0 0 C 
2 2.89 22 0 0 B 18 2.83 19 0 0 G 
3 3.28 24 0 0 B 19 3.12 23 1 om B 
4 2.92 12 0 0 B 20 3.16 25 1 1 A 
5 4.00 21 0 1 A 21 ` 2.06 22 1 0 G 
6 2.86 17 0 0 B 22 3.62 28 <A 1 = A 
7 2.76 17 0 0 B 23 2.89 14 1 0 G 
8 2.87 21 0 0 B 24 3.51 26 1 0 B 
9 3.03 25 0 0 G 25 3.54 24 1 i A 
10 3.92 29 0 1 A 26 2.83 27 1 1 A 
11 2.63 20 0 0 C 27 3.39 17 1 1 A 
12 3.32 23 0 0 B 28 2.67 24 1 0 B 
13 SEYA eS 0 0 B 29 3.65 21 1 1 A 
14 3.26 25 0 1 A 30 4.00 23 1 1 A 
US 35B 26 OF 0 B 31 310 ° 21 1 0 G 
16 2.74 - 19 0 0 B 32 2.39 19 1 1 A 


Notes: Grade Y = 1 if the final grade is A 
= 0 if the final grade is B or C 
TUCE = score on an examination given at the beginning of the term to test entering knowledge of macroeconomics 
PSI = 1 if the new teaching method is used 
= 0 otherwise 
GPA = the entering grade point average 


Source: L. Spector and M. Mazzeo, “Probit Analysis and Economic Education,” Journal of Economic Education, vol. 11, 1980, pp. 37—44. 
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Table 15.8 Regression Results of Equation (15.8.1) 


Dependent Variable: Grade 
Method: ML-Binary Logit 
Convergence achieved after 5 iterations 


Variable = Coefficient Sicel., lreigene Ly SS ERE URE Ie Probability 
ie ALS) (OILS! 4,931 -2.6405 0.0082 
GPA 2. 826. 1.2629 22T 070252 
TUGE 0.0951 O15 0r6m228 0.5014 
PSI 2.3786 Leas 2m2 SA5 0.0255 


McFadden R? =0.3740 LR statistic (3 df) =15.40419 


3. As noted earlier, the conventional measure of goodness of fit, R?, is not particularly meaningful in binary 
regressand models. Measures similar to R?, called pseudo R?, are available, and there are a variety ofthem.”° 
EViews presents one such measure, the McFadden R’, denoted by Reve, whose value in our example is 
0.3740.” Like R?, R?, Rec also ranges between 0 and 1. Another comparatively simple measure of goodness 
of fit is the count R?, which is defined as: 


number of correct predictions 


Count R? = (15.8.2) 


total number of observations 

Since the regressand in the logit model takes a value of | or zero, if the predicted probability is greater than 
0.5, we classify that as 1, but if it is less than 0.5, we classify that as 0. We then count the number of correct 
predictions and compute the R? as given in Eq. (15.8.2). We will illustrate this shortly. 

It should be noted, however, that in binary regressand models, goodness of fit is of secondary 
importance. What matters is the expected signs of the regression coefficients and their statistical and/or 
practical significance. 

4. To test the null hypothesis that all the slope coefficients are simultaneously equal to zero, the equivalent 
of the F test in the linear regression model is the likelihood ratio (LR) statistic. Given the null hypothesis, 
the LR statistic follows the y? distribution with df equal to the number of explanatory variables, three in the 
present example. (Note: Exclude the intercept term in computing the df.) 

Now let us interpret the regression results given in Eq. (15.8.1). Each slope coefficient in this equation is 
a partial slope coefficient and measures the change in the estimated logit for a unit change in the value of the 
given regressor (holding other regressors constant). Thus, the GPA coefficient of 2.8261 means, with other 
variables held constant, that if GPA increases by a unit, on average the estimated logit increases by about 
2.83 units, suggesting a positive relationship between the two. As you can see, all the other regressors have a 
positive effect on the logit, although statistically the effect of TUCE is not significant. However, together all 
the regressors have a significant impact on the final grade, as the LR statistic is 15.40 with a p value of about 
0.0015, which is very small. 


26For an accessible discussion, see J. Scott Long, Regression Models for Categorical and Limited Dependent Variables, Sage 
Publications, Newbury Park, California, 1997, pp. 102-113. 

27Technically, this is defined as: 1—(LLF,,,/LLF,), where LLF,, is the unrestricted log likelihood function where all regressors 
are included in the model and LLF, is the restricted log likelihood function where only the intercept is included in the model. 
Conceptually, LLF,,, is equivalent to RSS and LLF, is equivalent to TSS of the linear regression model. 
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As noted previously, a more meaningful interpretation is in terms of odds, which are obtained by taking 
the antilog of the various slope coefficients. Thus, if you take the antilog of the PSI coefficient of 2.3786 you 
will get 10.7897 (~ e”3"®°), This suggests that students who are exposed to the new method of teaching are 
more than 10 times as likely to get an A than students who are not exposed to it, other things remaining the 
same. 

Suppose we want to compute the actual probability of a student getting an A grade. Consider student 
number 10 in Table 15.7. Putting the actual data for this student in the estimated logit model given in Table 
15.8, the reader can check that the estimated logit value for this student is 0.8178. Using Eq. (15.5.2), the 
reader can easily check that the estimated probability is 0.69351. Since this student’s actual final grade was an 
A, and since our logit model assigns a probability of 1 to a student who gets an A, the estimated probability 
of 0.69351 is not exactly 1 but close to it. 

Recall the count R? defined earlier. Table 15.9 gives you the actual and predicted values of the regressand 
for our illustrative example. From this table you can observe that, out of 32 observations, there were 6 
incorrect predictions (students 14, 19, 24, 26, 31, and 32). Hence the count R? value is 26/32 = 0.8125, 
whereas the McFadden R? value is 0.3740. Although these two values are not directly comparable, they give 
you some idea about the orders of magnitude. Besides, one should not overplay the importance of goodness 
of fit in models where the regressand is dichotomous. 


Example 15.5 Who Owns a Debit Card? Logit Analysis 


We have already seen the results of the linear probability model (LPM) applied to the bank debit card data, so 
let us see how the logit mode! does. The results are as follows: 


Dependent Variable: DEBIT 

Method: ML-Binary Logit (Quadratic hill climbing) 
Sample: 1-60 

Included observations: 60 

Convergence achieved after 4 iterations 

Covariance matrix computed using second derivatives 


Variable Coefficient SCAM hrror z-Statistic Prob. 
Cc -0.574900 0.785787 -0.731624 0.4644 
Balance 0.001248 0.000697 i 1.789897 0.0735 
ATM -0.120225 0.093984 -1.279205 0.2008 
Interest -1.352086 0.680988 -1.985478 0.0471 
: = E. mis A - {v 
McFadden R-squared 0.080471 Mean dependent var. 0.433333 
S.D. dependent var. 07499 S.E. of regression 0.486274 
Akaike info criterion 1.3 DIGAS Sum squared resid. 13.24192 
Schwarz criterion dr See S. Log likelihood =37. 75024 
Hannan-Quinn criter. 1.446289 Restr. log likelihood -41.05391 
LR statistic 62607325 Avg. log Itkelihood -0.629171 
Prob. (LR statistic) 0.085525 

Obs. with Dep = 0 34 Total obs. 60 


Obs. with Dep = 1 26 


The positive sign of Balance and the negative signs of ATM and Interest are similar to the LPM, although we 
cannot directly compare the two. The interpretation of the coefficients in the logit model is different from 
the LPM. Here, for example, if the interest rate goes up by 1 percentage point, the logit goes down by about 
1.35, holding other variables constant. If we take the anti-log of -1.352086, we get about 0.2587. This means 


that if interest rate is paid on account balances, on average only about one-fourth of the customers are likely 
to hold debit cards. 


Table 15.9 


Observation 
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Actual and Fitted Values Based on Regression in Table 15.8 


Actual 


Go} (se) eo W)C) (et (Ss) Sia OGG fe) SO fee (o) Ce) (eo) (=) (ee eyo} (oe) (=) 


Fitted 
0.02658 
0.05950 
0.18726 
0.02590 


0.56989 


0.03486 
0.02650 
0.05156 
0.11113 
0.69351 
0.02447 
0.19000 
0.32224 
0.19321 
0.36099 
0.03018 
0.05363 
0.03859 
0.58987 
0.66079 
0.06138 
0.90485 


0.24177 . 


0.85209 
0.83829 
0.48113 
0.63542 
0.30722 
0.84170 
0.94534 
0.52912 
0.11103 


Residual 


—0.02658 
—0.05950 
—0.18726 
—0.02590 
0.43011 
—0.03486 
—0.02650 
—0.05156 
=U. hrs 
0.30649 
—0.02447 
—0.19000 
—0.32224 
0.80679 
—0.36099 
—0.03018 
—0.05363 
—0.03859 
—0.58987 
0.33921 
—0.06138 
0.09515 
—0.24177 
—0.85209 
0.16171 
0.51887 
0.36458 
—0.30722 
0.15830 
0.05466 
—0.52912 
0.88897 


Residual Plot 


*Incorrect predictions. 
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From the estimated LR statistic we see that collectively the three variables are statistically significant at 
about the 8.5 percent level. If we use the conventional 5 percent significance level, then these variables are 


only marginally significant. 
The McFadden R? value is quite low. Using the data, the reader can find out the value of the count R°. 


As noted earlier, unlike the LPM, the slope coefficients do not give us the rate of change of probability for 
a unit change in the regressor. We have to calculate them as shown in Table 15.6. Fortunately, this manual 
task is not necessary, for statistical packages like STATA can do this routinely. For our example, the results are 


as follows: 


Marginal effects after logit 


Y = Pr(debit) (predict) 
= .42512423 
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Variable | dy/dx Std. Error Z p> izi [ 95% C.I. ] x 

Balance - | -000305 -00017 eS O073 -.000029 -000639 1499787 
Interest* | =.2993972 ~ 12919 -2.32 0.020 ° #£-.552595 —.046199 -266667 
ATM | -.0293822 02297 -1.25 ORAZ -.074396 T0563 IOB) 


*dy/dx is for discrete change of dummy variable from 0 to 1. 


The coefficient of 0.000305 suggests that customers with higher balances have a 0.03 percent higher 
probability of owning a debit card, but if the interest rate goes up by 1 percentage point, the probability 
of owning a debit card goes dewn by about 30 percent. The coefficient of ATM, although statistically 
insignificant, suggests that if ATM transactions go up by a unit, the probability of owning a debit card goes 
down by about 2.9 percent. 


15.9 The Probit Model 


As we have noted, to explain the behavior of a dichotomous dependent variable we will have to use a suitably 
chosen cumulative distribution function (CDF). The logit model uses the cumulative logistic function, as 
shown in Eq. (15.5.2). But this is not the only CDF that one can use. In some applications, the normal CDF 
has been found useful. The estimating model that emerges from the normal CDF” is popularly known as the 
probit model, although sometimes it is also known as the normit model. In principle one could substitute the 
normal CDF in place of the logistic CDF in Eq. (15.5.2) and proceed as in Section 16.5. Instead of following 
this route, we will present the probit model based on utility theory, or rational choice perspective on behavior, 
as developed by McFadden.” 

To motivate the probit model, assume that in our home ownership example the decision of the ith family 
to own a house or not depends on an unobservable utility index J; (also known as a latent variable), that is 
determined by one or more explanatory variables, say income X, in such a way that the larger the value of the 
index J;, the greater the probability of a family owning a house. We express the index J, as 


li = Bi + BoX; (15.9.1) 
where X; is the income of the ith family. 

How is the (unobservable) index related to the actual decision to own a house? As before. let Y = 1 if 
the family owns a house and Y = 0 if it does not. Now it is reasonable to assume that there is a critical or 
threshold level of the index, call it 7*, such that if /;exceeds 7*, the family will own a house, otherwise it 
will not. The threshold 7*, like J;, is not observable, but if we assume that it is normally distributed with the 
same mean and variance, it is possible not only to estimate the parameters of the index given in Eq. (15.9.1) 
but also to get some information about the unobservable index itself. This calculation is as follows. 


28See Appendix A for a discussion of the normal CDF. Briefly, if a variable X follows the normal distribution with mean u 
and variance o~, its PDF is 
1 


2027 


e-(X-u)?/20? 


F(X) = 


and its CDF is 


Xo 1 2 1902 

F(X) = J e(X-u)}?/20 
-œ 4202x 

where Xo is some specified value of X. 

29D, McFadden, “Conditional Logit Analysis of Qualitative Choice Behavior,” in P. Zarembka (ed.), Frontiers in Econometrics, 

Academic Press, New York, 1973. 
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Given the assumption of normality, the probability that /* is less than or equal to /,can be computed from 
the standardized normal CDF as:*° 


P; = P(Y = 1|X) = P(Y < L) = P(Z; < Bi + BoXi) = F(Bi + bX) (15.9.2) 
where P(Y = 1 | X) means the probability that an event occurs given the value(s) of the X, or explanatory, 
variable(s) and where Z, is the standard normal variable, i.e., Z ~ MO, o°). F is the standard normal CDF, 
which written explicitly in the present context is: 


Fi) = “22 dz 


ey 
=| e 
~ 2n —oo 

(15.9.3) 
l Bı +b2Xi = 
= —— e gz 
Vn J 

Since P represents the probability that an event will occur, here the probability of owning a house, it is 
measured by the area of the standard normal curve from — to 7; as shown in Figure 15.4a. 

Now to obtain information on 7, the utility index, as well as on B, and B,, we take the inverse of 
Eg. (15.9.2) to obtain: 


I, = F7) = F~ (2) 
= B; + BX; 


where F~! is the inverse of the normal CDF. What all this means can be made clear from Figure 15.4. In panel 
(a) of this figure we obtain from the ordinate the (cumulative) probability of owning a house given J* < J;, 
whereas in panel (b) we obtain from the abscissa the value of J; given the value of P, which is simply the 
reverse of the former. 

But how do we actually go about obtaining the index 7, as well as estimating B, and B,? As in the case of 
the logit model, the answer depends on whether we have grouped data or ungrouped data. We consider the 
two cases individually. 


(15.9.4) 


a ee erent erate ene as 


— o0 0 + oo — o0 0 + œ 


li= By +B. X; I;= F\(P;) 


(a) (b) 
Figure 15.4 Probit model: (a) given 1, read P, from the ordinate; (2) given P, read Z, from the abscissa. 


30A normal distribution with zero mean and unit (= 1) variance is known as a standard or standardized normal variable (see 
Appendix A). 
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Probit Estimation with Grouped Data: gprobit 


We will use the same data that we used for glogit, which is given in Table 15.4. Since we already have Ê, 
the relative frequency (the empirical measure of probability) of owning a house at various income levels as 
shown in Table 15.5, we can use it to obtain /; from the normal CDF as shown in Table 15.10, or from Figure 
15.5. 


Table 15.10 Estimating the Index I; from the Standard Normal 


CDF 
P, b= F-(P) 
0.20 —0.8416 
0.24 —0.7063 
0.30 —0.5244 
0.35 —0.3853 
0.45 =0/1257 
0.51 0.0251 
0.60 0.2533 
0.66 0.4125 
0.75 0.6745 


0.80 0.8416 


Notes: (1) Ê; are from Table 15.5; (2) J; are estimated from the standard normal 
CDE 


= 00 0 04 +00 
Figure 15.5 Normal CDF. . 


Once we have the estimated /;, estimating B, and £, is relatively straightforward, as we show shortly. In 
passing, note that in the language of probit analysis the unobservable utility index 7, is known as the normal 
equivalent deviate (n.e.d.) or simply normit. Since the n.e.d. or J; will be negative whenever P, < 0.5, in 
practice the number 5 is added to the n.e.d. and the result is called a probit. 
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Example 15.6 Illustration of Gprobit Using Housing Example 


Let us continue with our housing example. We have already presented the results of the glogit model for this 
example. The grouped probit (gprobit) results of the same data are as follows: 

Using the n.e.d. (= N) given in Table 15.10, the regression results are as shown in Table 15.11.32! The 
regression results based on the probits (= n.e.d. + 5) are as shown in Table 15.12. 

Except for the intercept term, these results are identical with those given in the previous table. But this 
should not be surprising. (Why?) 


Table 15.11 
Dependent Variable: JI 


Variable Coefficient Std. Error t-Statistic Probability 
(Gj . -1.0166 OOS 7/2 m -17.7473 1.0397E-07 
Income 0.04846 0.00247 19.5585 4.8547E-08 


R? =0.97951 Durbin-Watson statistic =0.91384 


Table 15.12 


Dependent Variable: Probit 


Variable Coefficient Std. Error t-Statistic Probability 


G 309833 0.05728 6905336 A) AOE 1/2) ES 
Income 0.04846 0.00247 WS) 6 SNES) 4.8547E-08 


R? =0.9795 Durbin-Watson statistic =0.9138 


Note: These results are not corrected for heteroscedasticity (see Exercise 15.12). 


Interpretation of the Probit Estimates in Table 15.11 


How do we interpret the preceding results? Suppose we want to find out the effect of a unit change in X 
(income measured in thousands of dollars) on the probability that Y = 1, that is, a family purchases a house. 
To do this, look at Eq. (15.9.2). We want to take the derivative of this function with respect to X (that is, the 
rate of change of the probability with respect to income). It turns out that this derivative is: 


dP; l 

— = Xi 15.9.5)” 
aX, F (Bi + BoXi) Bo ( ) 
where AB; + B>X,) is the standard normal probability density function evaluated at B, + 6,X; As you will 
realize, this evaluation will depend on the particular value of the X variables. Let us take a value of X from 
Table 15.5, say, X = 6 (thousand dollars). Using the estimated values of the parameters given in Table 15.11, 


31The following results are not corrected for heteroscedasticity. See Exercise 15.12 for the appropriate procedure to correct 
heteroscedasticity. 3 
32We use the chain rule of derivatives: 
dP; dF(t) dt 
dX; dt dX 
where t = B; +82X; 
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we thus want to find the normal density function at f[-1.0166 + 0.04846(6)] = f(-0.72548). If you refer to 
the normal distribution tables, you will find that for Z = —0.72548, the normal density is about 0.3066.” 
Now multiplying this value by the estimated slope coefficient of 0.04846, we obtain 0.01485. This means 
that starting with an income level of $6,000, if the income goes up by $1,000, the probability of a family 
purchasing a house goes up by about 1.4 percent. (Compare this result with that given in Table 15.6.) 

As you can see from the preceding discussion, compared with the LPM and logit models, the computation 
of changes in probability using the probit model is a bit tedious. 

Instead of computing changes in probability, suppose you want to find the estimated probabilities from the 
fitted gprobit model. This can be done easily. Using the data in Table 15.11 and inserting the values of X from 
Table 15.5, the reader can check that the estimated n.i.d. values (to two digits) are as follows: 


X 6 8 10 13 15 20 25 30 35 40 
Estimated n.i.d. —0.72 —0.63 —0.53 —0.39 —0.29 —0.05 0.19 0.43 0.68 0.92 


Now statistical packages such as MINITAB can easily compute the (cumulative) probabilities associated 
with the various n.i.d.’s. For example, corresponding to an n.i.d. value —0.63. the estimated probability is 
0.2647 and, corresponding to an n.i.d. value of 0.43, the estimated probability is 0.6691. If you compare these 
estimates with the actual values given in Table 15.5, you will find that the two are fairly close. suggesting that 
the fitted model is quite good. Graphically, what we have just done is already shown in Figure 15.4. 


The Probit Model for Ungrouped or Individual Data 


Let us revisit Table 15.7, which gives data on 32 individuals about their final grade in an intermediate micro- 
economics course in relation to the variables GPA, TUCE, and PSI. The results of the logit regression are 
given in Table 15.8. Let us see what the probit results look like. Notice that as in the case of the logit model 
for individual data, we will have to use a nonlinear estimating procedure based on the method of maximum 
likelihood. The regression results calculated by EViews 6 are given in Table 15.13. 

“Qualitatively,” the results of the probit model are comparable with those obtained from the logit model 
in that GPA and PSI are individually statistically significant. Collectively, all the coefficients are statistically 
significant, since the value of the LR statistic is 15.5458 with a p value of 0.0014. For reasons discussed in 
the next sections, we cannot directly compare the logit and probit regression coefficients. 


‘Table 15.13 x 


Dependent Variable: grade 
Method: ML—Binary probit 
Convergence achieved after 5 iterations 


Variable Coefficient Std. Error Z-Statistic Probability 


C = (A523 2.5424 Seo Seall 0.0033 
GPA 1E2258 0.6938 2.3430 OON! 
TUCE 040517 0.0838 0.6166 0:5374 
PSI 1.4263 5950 239710 0.0165 


LR statistic (3 df) =15.5458 McFadden R* = 0.3774 
Probability (LR stat) =0.0014 


33Note that the standard normal Z can range from — to +x, but the density function Z) is always positive. 
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Table 15.14 


Dependent Variable: grade 


Variable Coefficient Seo C Jdpchatoha t-Statistic Probability 


C -1.4980 05238 52.8594 0) O78) 
GPA 0.4638 0. 1619 2.8640 0.0078 
TUCE 0.0104 0.0194 0.5386 0.5943 
PSI 0.3785 omgi Pay HAO, 0.0110 


R?=0.4159 Durbin-Watson d=2.3464 F-statistic = 6.6456 


For comparative purposes, we present the results based on the linear probability model (LPM) for the 
grade data in Table 15.14. Again, qualitatively, the LPM results are similar to the logit and probit models in 
that GPA and PSI are individually statistically significant but TUCE is not. Also, together the explanatory 
variables have a significant impact on grade, as the F value of 6.6456 is statistically significant because its p 
value is only 0.0015. 


The Marginal Effect of a Unit Change in the Value of a Regressor in the 
Various Regression Models 


In the linear regression model, the slope coefficient measures the change in the average value of the regressand 
for a unit change in the value of a regressor, with all other variables held constant. 

In the LPM, the slope coefficient measures directly the change in the probability of an event occurring 
as the result of a unit change in the value of a regressor, with the effect of all other variables held constant. 

In the logit model the slope coefficient of a variable gives the change in the log of the odds associated 
with a unit change in that variable, again holding all other variables constant. But as noted previously, for the 
logit model the rate of change in the probability of an event happening is given by £, P{1 — P;), where $; is 
the (partial regression) coefficient of the jth regressor. But in evaluating P, all the variables included in the 
analysis are involved. 

In the probit model, as we saw earlier, the rate of change in the probability is somewhat complicated and 
is given by B, fZ), where AZ;) is the density function of the standard normal variable and Z;= B, + BX); + 
+ B,X,;, that is, the regression model used in the analysis. 

Thus, in both the logit and probit models all the regressors are involved in computing the changes in 
probability, whereas in the LPM only the jth regressor is involved. This difference may be one reason for the 
early popularity of the LPM model. Statistical packages, such as STATA, have made the task of finding the 
rate of change of probability for the logit and probit models much easier. So now there is no need to choose 
LPM just because of its simplicity. 


15.10 Logit and Probit Models 


Although for our grade example LPM, logit, and probit give qualitatively similar results, we will confine 
our attention to logit and probit models because of the problems with the LPM noted earlier. Between logit 
and probit, which model is preferable? In most applications the models are quite similar, the main difference 
being that the logistic distribution has slightly fatter tails, which can be seen from Figure 15.6. That is to say, 
the conditional probability P, approaches 0 or 1 at a slower rate in logit than in probit. This can be seen more 
clearly from Table 15.15. Therefore, there is no compelling reason to choose one over the other. In practice 
many researchers choose the logit model because of its comparative mathematical simplicity. 
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Figure 15.6 Logit and probit cumulative distributions. 


Table 15.15 Values of Cumulative Probability Functions 


Cumulative Normal 


Cumulative Logistic 


oa e-*/2ds5 1 

Z P,(Z) a Wei, T P2(Z) a 1 “a ez 
—3.0 0.0013 0.0474 
—2.0 0.0228 0.1192 
—1.5 0.0668 l 0.1824 
—1.0 0.1587 0.2689 
—0.5 0.3085 l 0.3775 
0 0.5000 0.5000 
0.5 0.6915 0.6225 
1.0 0.8413 0.7311 
ee 0.9332 0.8176 
2.0 0.9772 0.8808 


3.0 0.9987 0.9526 2 


Though the models are similar, one has to be careful in interpreting the coefficients estimated by the 
two models. For example, for our grade example, the coefficient of GPA of 1.6258 of the probit model (see 
Table 15.13) and 2.8261 of the logit model (see Table 15.8) are not directly comparable. The reason is that, 
although the standard logistic (the basis of logit) and the standard normal distributions (the basis of probit) 
both have a mean value of zero, their variances are different; | for the standard normal (as we already know) 
and 7/3 for the logistic distribution, where m ~ 22/7. Therefore, if you multiply the probit coefficient 
by about 1.81 (which is approximately 7 /v3), you will get approximately the logit coefficient. For our 
example, the probit coefficient of GPA is 1.6258. Multiplying this by 1.81, we obtain 2.94, which is close 
to the logit coefficient. Alternatively. if you multiply a logit coefficient by 0.55 (= 1/1.81), you will get the 
probit coefficient. Amemiya, however, suggests multiplying a logit estimate by 0.625 to get a better estimate 
of the corresponding probit estimate.** Conversely, multiplying a probit coefficient by 1.6 ( = 1/0.625) gives 
the corresponding logit coefficient. 


34T, Amemiya, “Qualitative Response Model: A Survey,” Journal of Economic Literature, vol. 19, 1981, pp. 481-536. 
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Incidentally, Amemiya has also shown that the coefficients of LPM and logit models are related as follows: 


Brem = 0.25 Blogit except for intercept 
and 


Bem = 9.25 Bhogit + 0.5 for intercept 


We leave it to the reader to find out if these approximations hold for our grade example. 
To conclude our discussion of LPM, logit, and probit models, we consider an extended example. 


Example 15.7 To Smoke or Not to Smoke 


To find out what factors determine whether or not a person becomes a smoker, we obtained data on 1,196 
individuals.*° For each individual, there is information on education, age, income, and the price of cigarettes in 
1979. The dependent variable is smoker, with 1-smokers and 0-nonsmokers. Further analysis will be examined 
by Exercise 15.20 and the data can be found in Table 15.28 on the textbook website. For comparative 
purposes, we present the results based on LPM, logit, and probit models in a tabular form (see Table 15.16). 
These results have been obtained from STATA version 10. 


Table 15.16 
Variables LPM Logit Probit 
Constant 1.1230 2.7450 1.7019 
(5.96) (3.31) (3.33) 
Age —0.0047 —0.0208 —0.0129 
(—5.70) (—5.58) (—5.66) 
Education —0.0206 —0.0909 —0.0562 
(—4.47) (—4.40) (—4.45) 
Income 1.03e-0.6 4.72e-06 2.72e-06 
(0.63) (0.66) (0.62) 
Pcigs79 —0.0051 —0.0223 —0.0137 
(—1.80) (—1.79) (—1.79) 
R? 0.0388 0.0297 0.0301 


Notes: Figures in the parentheses are ¢ ratios for LPM and z ratios for logit and probit. For logit and 
probit, the R? values are pseudo R? values. 


Although the coefficients of the three models are not directly comparable, qualitatively they are similar. 
Thus, age, education, and price of cigarettes have a negative impact on smoking and income has positive 
impact. Statistically, the income effect is zero and the price effect is significant at about an 8 percent level. In 
Exercise 15.20, you are asked to apply the conversion factor to render the various coefficients comparable. 

in Table 15.1 7 we present the marginal effect of each variable on the probability of smoking for each model 
type. 

As you will recognize, the marginal effect of a variable on the probability of smoking for LPM is directly 
obtained from the estimated regression coefficients, but for the logit and probit models they have to be 
computed as discussed in the chapter. 

It is interesting that the marginal effects are quite similar for the three models. For example, if the level of 
education goes up, on average, the probability of someone becoming a smoker goes down by about 2 percent. 


35These data are from Michael P. Murray, Econometrics: A Modern Introduction, Pearson/Addison-Wesley, Boston, 2006, and 
can be downloaded from www.aw-bc.com/murray. 
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Table 15.17 -s 
Variables LPM Logit Probit 
Age —0.0047 —0.0048 —0.0049 
Education —0.0206 —0.0213 —0.0213 
income 1.03e-06 1.11e-06 1.03e-06 
—0.0051 —0.0052 —0.0052 


Pcigs79 


Note: Except for income, the estimated coefficients are highly statistically significant for age and 
education, and significant at about the 8 percent level for the price of cigarettes. 


15.11 The Tobit Modei 


An extension of the probit model is the tobit model originally developed by James Tobin, the Nobel laureate 
economist. To explain this model, we continue with our home ownership example. In the probit model 
our concern was with estimating the probability of owning a house as a function of some socioeconomic 
variables. In the tobit model our interest is in finding out the amount of money a person or family spends on 
a house in relation to socioeconomic variables. Now we face a dilemma: If a consumer does not purchase a 
house, obviously we have no data on housing expenditure for such consumers: we have such data only on 
consumers who actually purchase a house. 

Thus consumers are divided into two groups, one consisting of, say, n) consumers about whom we have 
information on the regressors (say, income, mortgage interest rate, number of people in the family. etc.) as 
well as the regressand (amount of expenditure on housing) and another consisting of n, consumers about 
whom we have information only on the regressors but not on the regressand. A sample in which information 
on the regressand is available only for some observations is known as a censored sample.*° Therefore, the 
tobit model is also known as a censored regression model. Some authors call such models limited dependent 
variable regression models because of the restriction put on the values taken by the regressand. 

Statistically, we can express the tobit model as 


Y; = Bi + BX, + ui if RHS > 0 
at) otherwise 


where RHS = right-hand side. Note: Additional X variables can be easily added to the model. 

Can we estimate regression (15.11.1) using only n) observations and not worry about the remaining 
n, observations? The answer is no, for the OLS estimates of the parameters obtained from the subset of 
n, observations will be biased as well as inconsistent; that is, they are biased even asymptotically.*” 

To see this, consider Figure 15.7. As the figure shows, if Y is not observed (because of censoring), all such 
observations ( = n,), denoted by crosses, will lie on the horizontal axis. If Y is observed, the observations 
(=n,), denoted by dots, will lie in the X—Y plane. It is intuitively clear that if we estimate a regression line 
based on the n, observations only, the resulting intercept and slope coefficients are bound to be different than 
if all the (n, + n.) observations were taken into account. 


(15.11.1) 


3A censored sample should be distinguished from a truncated sample in which information on the regressors is avail- 
able only if the regressand is observed. We will not pursue this topic here, but the interested reader may consult William H. 
Greene, Econometric Analysis, Prentice Hall, 4th ed., Englewood Cliffs, NJ, Chapter 19. For an intuitive discussion, see Peter 
Kennedy, A Guide to Econometrics, The MIT Press, Cambridge, Mass., 4th ed., 1998, Chapter 16. 


37The bias arises from the fact that if we consider only the n, observations and omit the others, there is no guarantee that 


E(u;) will be necessarily zero. And without E(u) = 0 we cannot guarantee that the OLS estimates will be unbiased. This bias 
can be readily seen from the discussion in Appendix 3A, Eqs. (4) and (5). 
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x: Expenditure data not 

va available, but income 
data available 

e: Both expenditure and 
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Expenditure on housing 


X 


Income 


Figure 15.7 Plot of amount of money consumer spends in buying a house versus income. 


How then does one estimate tobit, or censored regression, models, such as Eq. (15.11.1)? The actual 
mechanics involves the method of maximum likelihood, which is rather involved and is beyond the scope of 
this book. But the reader can get more information about the ML method from the references.’ 

James Heckman has proposed an alternative to the ML method, which is comparatively simple.*’ This 
alternative consists of a two-step estimating procedure. In step 1, we first estimate the probability of a 
consumer owning a house, which is done on the basis of the probit model. In step 2, we estimate the model 
(15.11.1) by adding to it a variable (called the inverse Milis ratio or the hazard rate) that is derived from the 
probit estimate. For the actual mechanics, see the Heckman article. The Heckman procedure yields consistent 
estimates of the parameters of Eq. (15.11.1), but they are not as efficient as the ML estimates. Since most 
modern statistical software packages have the ML routine, it may be preferable to use these packages rather 
than the Heckman two-step procedure. 


Illustration of the Tobit Model: Ray Fair’s Model of Extramarital Affairs’? 


In an interesting and theoretically innovative article, Ray Fair collected a sample of 601 men and women then 
married for the first time and analyzed their responses to a question about extramarital affairs.*! The variables 
used in this study are defined as follows: 
Y = number of affairs in the past year, 0, 1, 2, 3, 4-10 (coded as 7) 
Z, = 0 for female and 1 for male 


38See Greene, op. cit. A somewhat less technical discussion can be found in Richard Breen, Regression Models: Censored, 
Sample Selected or Truncated Data, Sage Publications, Newbury Park, California, 1996. 

39), J. Heckman, “Sample Selection Bias as a Specification Error,” Econometrica, vol. 47, pp. 153-161. 

40Ray Fair, “A Theory of Extramarital Affairs,” Journal of Political Economy, vol. 86, 1978, pp. 45-61. For the article and the 
data, see http://fairmodel.econ.yale.edu/rayfair/pdf/1978DAT.ZIP. 

“in 1969 Psychology Today published a 101-question survey on sex and asked its readers to mail in their answers. In the 
July 1970 issue of the magazine the survey results were discussed on the basis of about 2,000 replies that were collected in 
electronic form. Ray Fair extracted the sample of 601 from these replies. 
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Z, = age 

Z, = number of years married 

Z, = children: 0 if no children and 1 if children 

Z; = religiousness on a scale of 1 to 5, 1 being antireligion 

Z= education, years: grade school = 9; high school = 12, Ph.D. or other = 20 

Z = occupation, “Hollingshead” scale, 1—7 

Z; = self-rating of marriage, 1 = very unhappy, 5 = very happy 

Of the 601 responses, 451 individuals had no extramarital affairs, and 150 individuals had one or more 
affairs. r 

In terms of Figure 15.7, if we plot the number of affairs on the vertical axis and, say, education on the 
horizontal axis, there will be 451 observations lying along the horizontal axis. Thus, we have a censored 
sample, and a tobit model may be appropriate. 

Table 15.18 gives estimates of the preceding model using both (the inappropriate) OLS and (the appro- 
priate) ML procedures. As you can see, OLS includes 451 individuals who had no affairs and 150 who had 
one or more affairs. The ML method takes this into account explicitly but the OLS method does not, thus the 
difference between the two estimates. For reasons already discussed, one should rely on the ML and not the 
OLS estimates. The coefficients in the two models can be interpreted like any other regression coefficients. 
The negative coefficient of Z; (marital happiness) means that the higher the marital happiness, the lower is 
the incidence of extramarital affairs, perhaps an unsurprising finding. 


Table 15.18 OLS and Tobit Estimates of Extramarital Affairs 


Tobit Estimate 
7.6084 .(1.9479)t 


Explanatory Variable OLS Estimate 
Intercept 5.8720 (5.1622)* 


Zi 0.0540 (0.1799) 0.9457 (0.8898) 
Z2 —0.0509 (—2.2536) —0.1926 (—2.3799) 
Z3 0.1694 (4.1109) 0.5331 (3.6368) 
Z4 —0.1426 (—0.4072) 1.0191 (0.7965) 
Zs —0.4776 (—4.2747) —1.6990 (—4.1906) 
Ze —0.0137 (—0.2143). 0.0253 (0.1113) 
Z7 0.1049 (1.1803) 0.2129 (0.6631) 
Zs —0.7118 (—5.9319) ~2.2732 (—5.4724) 
R? 0.1317 0.1515 1 


*The figures in the parentheses are the ¢ values. 
tThe figures in the parentheses are the Z (standard normal) values. 


Note: In all there are 601 observations, of which 451 have zero values for the dependent variable (number of extramarital 
affairs) and 150 have nonzero values. 


In passing, note that if we are interested in the probability of extramarital affairs and not in the number 
of such affairs, we can use the probit model assigning Y = 0 for individuals who did not have any affairs and 
Y = 1 for those who had such affairs, giving the results shown in Table 15.19. With the knowledge of probit 
modeling, readers should be able to interpret the probit results given in this table on their own. 


15.12 Modeling Count Data: The Poisson Regression Model 


There are many phenomena where the regressand is of the count type, such as the number of vacations taken 
by a family per year, the number of patents received by a firm per year, the number of visits to a dentist or a 
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Table 15.19 


Dependent Variable: YSTAR 

Method: ML—Binary probit 

Sample: 1-601 

Included observations: 601 

Convergence achieved after 5 iterations 


Variable Coefficient 


— i- a ___ Std. Error Z Statistic Probability 
G 0.779402 OASISA 1.520638 0.1284 
Zi oNSAS T Oa S 7e el IL AAS HIAO LS: 0.2087 
Z2 -07024584 0.010418 -2.359844 0.0183 
Z3 0.054343 0.018809 2.889278 0.0039 
Za 0.216644 0.165168 i SII S) 7/ 0.1896 
Zs _ -0.185468 00051626 ~3.592551 0.0003 
Ze OM Ost Ae) OmOZ9 5a 0.381556 0.7028 
Z7 0.013669 0.041404 0.330129 Ve yas 
Zg -0.271791 ORO Ss -5.082608 0.0000 
Mean dependent var. 0.249584 S.D. dependent var. ORAS eres 
S.E. of regression 0.410279 Akaike info criterion 1.045584 
Sum squared resid. 99.65088 Schwarz criterion 1 Mea 3 
Log likelihood -305.1980 Hannan-Quinn criter. OMA A 
Restr. log likelihood -337.6885 Avg. log likelihood z0. S07E17 
LR statistic (8 df) 64.98107 McFadden R-squared 07096215 
Probability (LR stat) 4.87E-11 
Obs. with Dep= 0 aSk Total obs. 601 
Obs. with Dep=1 150 


doctor per year, the number of visits to a grocery store per week, the number of parking or speeding tickets 
received per year, the number of days stayed in a hospital in a given period, the number of cars passing 
through a toll booth in a span of, say, 5 minutes, and so on. The underlying variable in each case is discrete, 
taking only a finite number of values. Sometimes count data can also refer to rare, or infrequent, occurrences, 
such as getting hit by lightning in a span of a week, winning more than one lottery within 2 weeks, or having 
two or more heart attacks in a span of 4 weeks. How do we model such phenomena? 

Just as the Bernoulli distribution was chosen to model the yes/no decision in the linear probability model, 
the probability distribution that is specifically suited for count data is the Poisson probability distribution. 
The pdf of the Poisson distribution is given by: 


Y o-u 
OD ee 
where f(Y) denotes the probability that the variable Y takes non-negative integer values, and where Y! (read Y 
factorial) stands for Y! = Y X (Y — 1) X (Y-2) X 2 X 1. It can be proved that 
ENSE (15.12.2) 
var(Y) =p (15.12.3) 


Notice an interesting feature of the Poisson distribution: Its variance is the same as its mean value. 


Y= 0) 1, 25 ee (15.12.1) 


42See any standard book on statistics for the details of this distribution. 
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The Poisson regression model may be written as: 

Y, = E(Y;) + 4; = Hi + 4; (15.12.4) 
where the Y’s are independently distributed as Poisson random variables with mean y; for each individual 
expressed as 

hi = E(Y;) = Bi + BoX2; + BsX3, +--+ + BrXri (15.12.5) 
where the X’s are some of the variables that might affect the mean value. For example, if our count variable is 
the number of visits to the Metropolitan Museum of Art in New York in a given year, this number will depend 
on variables such as income of the consumer, admission price, distance from the museum, and parking fees. 

For estimation purposes, we write the model as: 
"eo E 
Ls Y! 
with u replaced by Eq. (5.12.5). As you can readily see, the resulting regression model will be nonlinear 


in the parameters, necessitating nonlinear regression estimation discussed in the previous chapter. Let us 
consider a concrete example to see how all this works out. 


of, (15.12.6) 


Example 15.8 An Illustrative Example: Geriatric Study of Frequency of Falls 


The data used here were collected by Neter et al.** The data relate to 100 individuals 65 years of age and 
older. The objective of the study was to record the number of falls (= Y) suffered by these individuals in relation 
to gender (X, = 0 female and 1 for male), a balance index (X3), and a strength index (X,). The higher the 
balance index, the more stable is the subject, and the higher the strength index, the stronger is the subject. 
To find out if education or education plus aerobic exercise has any effect on the number of falls, the authors 
introduced an additional variable (X4), called the intervention variable, such that X, = 0 if only education and 
X, = 1 if education plus aerobic exercise training. The subjects were randomly assigned to the two intervention 
methods. 
Using EViews 6, we obtained the output in Table 15.20. 


Table 15.20 


Dependent Variable: Y 
Sample: 1-100 
Convergence achieved after 7 iterations mA 


Coefficient Std Error t-Statistic Probability 
enron] 037020 0.3459 120701 0.2875 
C(1) -1.10036 (OV F/O) -6.4525 | 0.0000 
C29 =). (Aisa 0.1105 -0.1985 0.8430 
CKD 0.01066 0.0027 329483 0.0001 - 
C(4) 0.00927 0.00414 2.2380 070275 


R?= 0.4857 Adjusted R?=0.4640 
Log likelihood = -197.2096 Durbin-Watson statistic =1.7358 


Note: EXP( ) means e (the base of natural logarithm) raised by the expression in ( ). 


43}ohn Neter, Michael H. Kutner, Christopher J. Nachtsheim, and William Wasserman, Applied Regression Models, Irwin, 
3d ed., Chicago, 1996. The data were obtained from the data disk included in the book and refer to Exercise 14.28. 


Qualitative Response Regression Models 607 


Interpretation of Results. Keep in mind that what we have obtained in Table 15.20 is the estimated mean 
value for the ith individual, &;; that is, what we have estimated is: 


jij = @9-3702-1.100366 x ;-0.02194 X2; +0.0106 X3;+0.00927 X4; (15.12.7) 


To find the actual mean value for the ith subject, we need to put the values of the various X variables for that 
subject. For example, subject 99 had these values: Y = 4, X} = 0, X,=1, X; = 50, and X4 =56. Putting these 
values in Eq. (15.12.7), we obtain ji99 = 3.3538 as the estimated mean value for the 99th subject. The actual 
Y value for this individual was 4. 


Now if we want to find out the probability that a subject similar to subject 99 has less than 5 falls per year, 
we can obtain it as follows: 


PY <5) = P(Y =0) + P(Y =1)+ P(Y =2)+ P(Y = 3) + P(Y =4) 


(3.3538)%e -3-3538 (3.3538)! e7 3.3538 (3.3538)2e73-3538 

ik oe a oe 
(3.3538)2e-33538  (3.3538)te -33538 
$ H 


3! 4! 
= 0.7491 


We can also find out the marginal, or partial, effect of a regressor on the mean value of Y as follows. In 
terms of our illustrative example, suppose we want to find out the effect of a unit increase in the strength 
index (X4) on mean Y. Since 


GSS RT IO SION 
po = @C0tC1 Xi +C2X2;+C3X3;+C4 Xai (15.12.8) 


we want to find 44/aX,. Using the chain rule of calculus, it can be easily shown that this is equal to 


0 : À : > 
a = Caeth Xii +C2X2;+C3X3;+C4 X4; = Cap (15.12.9) 


That is, the rate of change of the mean value with respect to a regressor is equal to the coefficient of that 
regressor times the mean value. Of course, the mean value u will depend on the values taken by all the 
regressors in the model. This is similar to the logit and probit models we discussed earlier, where the marginal 
contribution of a variable also depended on the values taken by all the variables in the model. 

Returning to the statistical significance of the individual coefficients, we see that the intercept and variable 
Xare individually statistically insignificant. But note that the standard errors given in the table are asymptotic 
and hence the t values are to be interpreted asymptotically. As noted previously, generally the results of all 
nonlinear iterative estimating procedures have validity in large samples only. 

In concluding our discussion of the Poisson regression model, it may be noted that the model makes 
restrictive assumptions in that the mean and the variance of the Poisson process are the same and that the 
probability of an occurrence is constant at any point in time. 


15.13 Further Topics in Qualitative Response Regression Models 


As noted at the outset, the topic of qualitative response regression models is vast. What we have presented 
in this chapter are some of the basic models in this area. For those who want to pursue this topic further, we 
discuss below very briefly some other models in this area. We will not pursue them here, for that would take 
us far away from the scope of this book. 


608 Basic Econometrics 


Ordinal Logit and Probit Models 


In the bivariate logit and probit models we were interested in modeling a yes or no response variable. But 
often the response variable, or regressand, can have more than two outcomes and very often these outcomes 
are ordinal in nature; that is, they cannot be expressed on an interval scale. Frequently, in survey-type 
research the responses are on a Likert-type scale, such as “strongly agree,” “somewhat agree,” or “strongly 
disagree.” Or the responses in an educational survey may be “less than high school,” “high school,” “college,” 
or “professional degrees.” Very often these responses are coded as 0 (less than high school), 1 (high school), 
2 (college), 3 (postgraduate). These are ordinal scales in that there is clear ranking among the categories but 
we cannot say that 2 (college education) is twice 1 (high school education) or 3 (postgraduate education) is 
three times 1 (high school education). 

To study phenomena such as the preceding, one can extend the bivariate logit and probit models to take 
into account multiple ranked categories. The arithmetic gets quite involved as we have to use multistage 
normal and logistic probability distributions to allow for the various ranked categories. For the underlying 
mathematics and some of the applications, the reader may consult the Greene and Maddala texts cited earlier. 
At a comparatively intuitive level, the reader may consult the Liao monograph.“ Software packages such as 
LIMDEP, EViews, STATA, and SHAZAM have routines to estimate ordered logit and probit models. 


Multinomial Logit and Probit Models 


In the ordered probit and logit models the response variable has more than two ordered, or ranked, categories. 
But there are situations where the regressand is unordered. Take, for example, the choice of transportation 
mode to work. The choices may be bicycle, motorbike, car, bus, or train. Although these are categorical 
responses, there is no ranking or order here; they are essentially nominal in character. For another example, 
consider occupational classifications, such as unskilled, semiskilled, and highly skilled. Again, there is no 
order here. Similarly, occupational choices such as self-employed, working for a private firm, working for a 
local government, and working for the federal government are essentially nominal in character. 

The techniques of multinomial logit or probit models can be employed to study such nominal categories. 
Again, the mathematics gets a little involved. The references cited previously will give the essentials of these 
techniques. And the statistical packages cited earlier can be used to implement such models, if their use is 
required in specific cases. 


ŢZv 


Duration Models 


Consider questions such as these: (1) What determines the duration of unemployment spells? (2) What 
determines the life of a light bulb? (3) What factors determine the duration of a strike? (4) What determines 
the survival time of an HIV-positive patient? 

Subjects such as these are the topic of duration models, popularly known as survival analysis or time-to- 
event data analysis. In each of the examples cited above, the key variable is the length of time or spell length, 
which is modeled as a random variable. Again the mathematics involves the CDFs and PDFs of appropriate 
probability distributions. Although the technical details can be tedious, there are accessible books on this 
subject.” Statistical packages such as STATA and LIMDEP can easily estimate such duration models. These 
packages have worked examples to aid the researcher in the use of such models. 


“Tim Futing Liao, op. cit. 
“See, for example, David W. Hosmer, Jr., and Stanley Lemeshow, Applied Survival Analysis, John Wiley & Sons, New York, 1999, 
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Summary and Conclusions 


1. Qualitative response regression models refer to models in which the response, or regressand, variable is 
not quantitative or an interval scale. 

2. The simplest possible qualitative response regression model is the binary model in which the regressand 
is of the yes/no or presence/absence type. 

3. The simplest possible binary regression model is the linear probability model (LPM) in which the 
binary response variable is regressed on the relevant explanatory variables by using the standard 
OLS methodology. Simplicity may not be a virtue here, for the LPM suffers from several estimation 
problems. Even if some of the estimation problems can be overcome, the fundamental weakness of the 
LPM is that it assumes that the probability of something happening increases linearly with the level of 
the regressor. This very restrictive assumption can be avoided if we use the logit and probit models. 

4. In the logit model the dependent variable is the log of the odds ratio, which is a linear function of the 
regressors. The probability function that underlies the logit model is the logistic distribution. If the data 
are available in grouped form, we can use OLS to estimate the parameters of the logit model, provided 
we take into account explicitly the heteroscedastic nature of the error term. If the data are available at 
the individual, or micro, level, nonlinear-in-the-parameter estimating procedures are called for. 

5. If we choose the normal distribution as the appropriate probability distribution, then we can use the 
probit model. This model is mathematically a bit difficult as it involves integrals. But for all practical 
purposes, both logit and probit models give similar results. In practice, the choice therefore depends on 
the ease of computation, which is not a serious problem with sophisticated statistical packages that are 
now readily available. 

6. If the response variable is of the count type, the model that is most frequently used in applied work is 
the Poisson regression model, which is based on the Poisson probability distribution. 

7. A model that is closely related to the probit model is the tobit model, also known as a censored 
regression model. In this model, the response variable is observed only if a certain condition(s) is met. 
Thus, the question of how much one spends on a car is meaningful only if one decides to buy a car to 
begin with. However, Maddala notes that the tobit model is “applicable only in those cases where the 
latent variable [i.e., the basic variable underlying a phenomenon] can, in principle, take negative values 
and the observed zero values are a consequence of censoring and nonobservability.”*° 

8. There are various extensions of the binary response regression models. These include ordered probit 
and logit and nominal probit and logit models. The philosophy underlying these models is the same as 
the simpler logit and probit models, although the mathematics gets rather complicated. 

9. Finally, we considered briefly the so-called duration models in which the duration of a phenomenon, 
such as unemployment or sickness, depends on several factors. In such models, the length, or the spell 
of duration, becomes the variable of research interest. 


46G. S. Maddala, Introduction to Econometrics, 2d ed., Macmillan, New York, 1992, p. 342. 
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Multiple Choice Questions 


. In linear probability model, the 


a. Regressand is dichotomous 

b. Regressand is ordinal variable 
c. Regressor is dichotomous 

d. Regressors is ordinal variable 


. In linear probability model, the dependent variable follows 


a. Normal distribution 

b. Chi-square distribution 

c. Bernoulli probability distribution 
d. Logistic distribution 


. In LPM, the error term follows 


a. Normal distribution 
b. Chi-square distribution 
c. Bernoulli probability distribution 
d. Logistic distribution 
The probability for underlying logit model is 
a. Normal distribution 
b. Logistic distribution 
c. x’ distribution 
d. F distribution 


. The distribution underlying the probit model is 


a. Normal distribution 
b. Logistic distribution 
c. x’ distribution 
d. F distribution 
In an LPM 
a. The errors are homoscedastic 
b. The errors are-heteroscedastic 
c. The errors are normally distributed 
d. The errors are all equal to zero 
In an LPM, R? is 
a. Not very helpful in judging the goodness of fit of the model 
b. The only statistic that is helpful in judging the goodness of fit of the model 
c. Calculated as raw R? 
d. Gives the true model 
E (Y; | X; ) for LPM must lie between 


a. —1 and +1 
b. Oand 1 
c. —l and 0 


d. O and 2 


10. 


l; 


2. 


13. 


14. 


lap 


16. 


WP 
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In LPM, the incremental effect of X on Y 
a. Increases with higher X values 
b. Decreases with higher X values 
c. Remains constant throughout 
d. Any of the above 
In logit model as P, goes from 0 to 1, logit L varies from 
a. Oto+ 
b. —2 to +00 
c. Otol 
d. —œ to 0 
In logit model, the log of the odds ratio is 
a. Linear in X and linear in parameters 
b. Linear in X and nonlinear in parameters 
c. Nonlinear in X and nonlinear in parameters 
d. Nonlinear in X and linear in parameters 
In logit model, if L is positive, it means that when the value of the explanatory variable(s) increases, the 
probability that the event of interest will occur 
a. Increases 
b. Decreases 
c. Remains the same 
d. Cannot say 
In logit model, as the odds ratio decreases from 1 to 0, the logit becomes 
a. Negative 
b. Equal to 0 
c. Fraction 
d. Positive 
Estimation of logit model with individual data at micro-level is done using 
a. Ordinary Least squares method 
b. Maximum likelihood method 
c. Weighted least squares 
d. Two-step method 
The null hypothesis that all slope coefficients are simultaneously equal to zero is tested in logit model by 
a. F-test 
b. T-test 
c. Chi-square test 
d. Likelihood ratio statistic 
The utility index J; in a probit model is also known as 
a. Critical index 
b. Latent variable 
c. Threshold level 
d. Cumulative distribution function 
The estimating procedure for probit model with individual ungrouped data is based on the method of 
Ordinary Least squares method : 
Maximum likelihood method 
Weighted least squares 
Two-step method 


AN Ss 
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18. The slope coefficient in LPM measures 
a. The change in the average value of the regressand for a unit change of a regressor 
b. The change in the log of the offs associated with a unit change in value of a regressor 
c. The change in the probability of an event occurring as a result of a unit change in value of a 
regressor 
d. Elasticity of change 
19. The slope coefficient in logit model measures 
a. The change in the average value of the regressand for a unit change of a regressor 
b. The change in the probability of an event occurring as a result of a unit change in value of a 
regressor 
c. Elasticity of change 
d. The change in the log of the offs associated with a unit change in value of a regressor 
20. Models that use censored data is 
a. LPM model 
b. Logit model 
. c. Probit model 
d. Tobit model 
21. A censored sample is same as truncated sample. This statement is 
a. True 
b. False 
c. Depends on censorship 
d. Depends on truncation 
22. The regressand in Poisson regression model takes the values 
a. The values 0 and 1 
b. Discrete finite numbers 
c. The values 0 to 1 
d. Ratio values 
23. Tobit model is also called censored model because 
a. We censor unimportant data from our dataset 
b. The first paper using this model did not pass the censor board of publications ~ 
c. The regressand in the sample data has information for only some observations 
d. The data on regressor(s) is censored based on the problem at hand 
24. If the response variable is ordinal in nature, one may use 
a. Ordinal logit and probit models 
b. Multinomial logit and probit model 
c. Bivariate logit and probit model 
d. Tobit model 
25. When the response variable is unordered with more than one category, one may use 
a. Ordinal logit and probit models 
b. Multinomial logit and probit model 
c. Bivariate logit and probit model 
d. Tobit model 
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Exercises 


Questions 


I. 


Pea 


Refer to the data given in Table 15.2. If Y, is negative, assume it to be equal to 0.01 and if it is greater 
than 1, assume it to be equal to 0.99. Recalculate the weights w; and estimate the LPM using WLS. 
Compare your results with those given in Eq. (15.2.11) and comment. 

For the home ownership data given in Table 15.1, the maximum likelihood estimates of the logit 
model are as follows: 


A 


: Ê 
=n (=) = —493.54+ 32.96 income 
' t= (—0.000008)(0.000008) 


Comment on these results, bearing in mind that all values of income above 16 (thousand dollars) 
correspond to Y = 1 and all values of income below 16 correspond to Y = 0. A priori, what would you 
expect in such a situation? 


. In studying the purchase of durable goods Y (Y = 1 if purchased, Y = 0 if no purchase) as a function of 


several variables for a total of 762 households, Janet A. Fisher’ obtained the following LPM results: 


Explanatory Variable Coefficient Standard Error 
Constant 0.1411 — 
1957 disposable income, X: 0.0251 0.0118 
(Disposable income = X1}, X2 —0.0004 0.0004 
Checking accounts, X3 —0.0051 0.0108 
Savings accounts, X4 = 0.0013 0.0047 
U.S. savings bonds, X5 —0.0079 0.0067 
Housing status: rent, X6 —0.0469 0.0937 
Housing status: own, X7 j 0.0136 0.0712 
Monthly rent, Xs —0.7540 1.0983 
Monthly mortgage payments, Xo —0.9809 0.5162 
Personal noninstallment debt, X10 —0.0367 0.0326 
Age, Xii 0.0046 0.0084 
Age squared, X12 —0.0001 0.0001 
Marital status, X13 (1 = married) 0.1760 0.0501 
Number of children, X14 0.0398 0.0358 
(Number of children = X14}, X15 —0.0036 0.0072 
Purchase plans, X16 (1 = planned; 0 otherwise) 0.1760 0.0384 
R? = 0.1336 


Notes: All financial variables are in thousands of dollars. 
Housing status: Rent (1 if rents; 0 otherwise). 
Housing status: Own (1 if owns; 0 otherwise). 
Source: Janet A. Fisher, “An Analysis of Consumer Goods Expenditure,” The Review of Economics and Statistics, vol. 64, 
no. 1, Table 1, 1962, p. 67. 


a. Comment generally on the fit of the equation. 
b. How would you interpret the coefficient of —0.0051 attached to the checking accounts variable? 
How would you rationalize the negative sign for this variable? 


*“An Analysis of Consumer Goods Expenditure,” The Review of Economics and Statistics, vol. 64, no. 1, 1962, pp. 64-71. 


614 Basic Econometrics 


c. What is the rationale behind introducing the age-squared and number of children-squared variables? 
Why is the sign negative in both cases? 

d. Assuming values of zero for all but the income variable, find out the conditional probability of a 
household whose income is $20,000 purchasing a durable good. 

e. Estimate the conditional probability of owning durable good(s), given: X; = $15,000, X; = $3,000, 
X,= $5,000, Xs = 0, X7 = 1, X; = $500, Xo = $300, X19 = 0, X11 = 35, X43 = 1, X14 = 2, X16 = 0. 

15.4. The R? value in the Jabor-force participation regression given in Table 15.3 is 0.175, which is rather 
low. Can you test this value for statistical significance? Which test do you use and why? Comment in 
general on the value of R? in such models. 

15.5. Estimate the probabilities òf owning a house at the various income levels underlying the regression 
(15.7.1). Plot them against income and comment on the resulting relationship. 

"15.6. In the probit regression given in Table 15.11 show that the intercept is equal to —u,/o, and the slope 
is equal to 1/o,, where u, and g, are the mean and standard deviation of X. 

15.7. From data for 54 standard metropolitan statistical areas (SMSA), Demaris estimated the following 

logit model to explain high murder rate versus low murder rate: 


InO; = 1.1387 + 0.0014P;+ 0.0561C; — 0.4050R; 
Se (0.0009) (0.0227) (0.1568) 


where O = the odds of a high murder rate, P = 1980 population size in thousands, C = population 

growth rate from 1970 to 1980, R = reading quotient, and the se are the asymptotic standard errors. 

a. How would you interpret the various coefficients? 

b. Which of the coefficients are individually statistically significant? 

c. What is the effect of a unit increase in the reading quotient on the odds of having a higher murder 
rate? - ` 

d. What is the effect of a percentage point increase in the population growth rate on the odds of 
having a higher murder rate? 

15.8. Compare and comment on the OLS and WLS regressions in Eqs. (15.7.3) and (15.7.1). 


Empirical Exercises 


15.9. From the household budget survey of 1980 of the Dutch Central Bureau of Statistics, J. S. Cramer 
obtained the following logit model based on a sample of 2,820 households. (The results given here 
are based on the method of maximum likelihood and are after the third iteration.) The purpose of the 
logit model was to determine car ownership as a function of (logarithm of) income. Car ownership 
was a binary variable: Y = 1 if a household owns a car, zero otherwise. 


L; = —2.77231 + 0.347582 In Income 
t = (—3.35) (4.05) 
x7(1 df) = 16.681 (p value = 0.0000) 


where Ê; = estimated logit and where In Income is the logarithm of income. The x? measures the 
goodness of fit of the model. 


*Optional. 
**Demaris, op. cit., p. 46. 


tJ. S. Cramer, An Introduction to the Logit Model for Economist, 2d ed., published and distributed by Timberlake Consultants 


Ltd., 2001, p. 33. These results are reproduced from the statistical package PC-GIVE 10 published by Timberlake 
Consultants, p. 51. 
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a. Interpret the estimated logit model. 
b. From the estimated logit model, how would you obtain the expression for the probability of car 
ownership? 
c. What is the probability that a household with an income of $20,000 will own a car? And at an 
income level of $25,000? What is the rate of change of probability at the income level of $20,000? 
d. Comment on the statistical significance of the estimated logit model. 
15.10. Establish Eq. (15.2.8). 
15.11. In an important study of college graduation rates of all high school matriculants and Black-only 
matriculants, Bowen and Bok obtained the results in Table 15.21, based on the logit model.” 


Table 15.21 Logistic Regression Model Predicting Graduation Rates, 1989 Entering Cohort 


_ All Matriculants 


. Black Only 
Parameter Standard Odds Parameter Standard Odds 
Variable Estimate Error Ratio Estimate Error Ratio 
Intercept 0.957 0.052 — 0.455 0.112 — 
Female 0.280 0.031 1.323 0.265 0.101 1.303 
Black —0.513 0.056 0.599 
Hispanic —0.350 0.080 0.705 
Asian 0.122 0.055 1.130 
Other race —0.330 0.104 0.719 
SAT > 1,299 0.331 0.059 1.393 0.128 0.248 1137 
SAT 1,200-1,299 0.253 0.055 1.288 0.232 0.179 1.261 
SAT 1,100-1,199 0.350 0.053 1.420 0.308 0.149 1.361 
SAT 1,000-1,099 0.192 0.054 1.211 0.141 0.136 1.151 
SAT not available —0.330 0.127 0.719 0.048 0.349 1.050 
Top 10% of high 0.342 0.036 1.407 0.315 0.117 1.370 
school class 
High school class rank —0.065 0.046 0.937 —0.065 0.148 0.937 
not available 
High socioeconomic 0.283 0.036 1.327 0.557 0.175 1.746 
status (SES) 
Low SES : ; —0.385 0.079 0.680 —0.305 0.143 0.737 
SES not available 0.110 0.050 1.116 0.031 0.172 1.031 
SEL-1 1.092 0.058 2.979 0.712 0.161 2.038 
SEL-2 0.193 0.036 1.212 0.280 0.119 1323 
Women’s college —0.299 0.069 0.742 0.158 0.269 1.171 
Number of observations 32,524 2,354 
—2 log likelihood 
Restricted 31,553 2,667 
Unrestricted 30,160 2,569 
Chi square 1,393 with 18 d.f. 98 with 14 d.f. 


Notes Bold coefficients are significant at the .05 level, other coefficients are not The omitted categories in the mode] are White, male, SAT < 1,000, bottom 90% of high 
school class, middle SES, SEL-3, coed institution. Graduation rates are 6-year, first-school graduation rates, as defined in the notes to Appendix Table D.3.1. Institutional 
selectivity categories are as defined in the notes to Appendix Table D.3.1. See Appendix B for definition of socioeconomic status (SES). 

SEL-1 = institutions with mean combined SAT scores of 1,300 and above. 

SEL-2 = institutions with mean combined SAT scores between 1,150 and 1,299. 

SEL-3 = institutions with mean combined SAT scores below 1,150. 


Source: Bowen and Bok, op. cit., p. 381. 


*William C. Bowen and Derek Bok, The Shape of the River: Long Term Consequences of Considering Race in College and 
University Admissions, Princeton University Press, Princeton, NJ, 1998, p. 381. 
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142. 


15:13: 


15.14. 


a. What general conclusion do you draw about graduation rates of all matriculants and black-only 
matriculants? 

b. The odds ratio is the ratio of two odds. Compare two groups of all matriculants, one with a SAT 
score of greater than 1,299 and the other with a SAT score of less than 1,000 (the base category). 
The odds ratio of 1.393 means the odds of matriculants in the first category graduating from 
college are 39 percent higher than those in the latter category. Do the various odds ratios shown in 
the table accord with a priori expectations? 

c. What can you say about the statistical significance of the estimated parameters? What about the 
overall significance of the estimated model? 


In the probit model given in Table 15.11 the disturbance u; has this variance: 
Dh PU E P;) 
O SS 
NI 


where f;is the standard normal density function evaluated at F~ KP). 

a. Given the preceding variance of u; how would you transform the model in Table 15.10 to make the 
resulting error term homoscedastic? 

b. Use the data in Table 15.10 to show the transformed data. 

c. Estimate the probit model based on the transformed data and compare the results with those based 
on the original data. 

Since R? as a measure of goodness of fit is not particularly well suited for the dichotomous dependent 

variable models, one suggested alternative is the x° test described below: 


pe F N= PY 
ee) 

where N; = number of observations in the ith cell 

Ê, = actual probability of the event occurring ( = n/N) 

Př = estimated probability 

G = number of cells (i.e., the number of levels at which X; is measured, e.g., 10 in Table 15.4) 
It can be shown that, for large samples, y? is distributed according to the y` distribution with (G — k) 
df, where k is the number of parameters in the estimating model (k < G). 

Apply the preceding y’ test to regression (15.7.1) and comment on the resulting goodness of fit and 

compare it with the reported R? value. 
Table 15.22 gives data on the results of spraying rotenone of differert concentrations on the chrysan- 
themum aphis in batches of approximately fifty. Develop a suitable model to express the probability 
of death as a function of the log of X, the log of dosage, and comment on the results. Also compute 
the x” test of fit discussed in Exercise 15.13. 


Table 15.22 Toxicity Study and Rotenone on Chrysanthemum Aphis 


Concentration, 
Milligrams per Liter 


z Total, Death, 
X log (X) ` N; ni P= n/N; 
2.6 0.4150 50 6 0.120 
3.8 0.5797 48 16 0.333 
5.1 0.7076 46 24 0.522 
7.7 0.8865 49 42 0.857 
10.2 1.0086 _ 50 44 0.880 


Source: D. J. Fennet, Probit Analysis, Cambridge University Press, London, 1964. 
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15.15. Thirteen applicants to a graduate program had quantitative and verbal scores on the GRE as listed in 
Table 15.23. Six students were admitted to the program. 
a. Use the LPM to predict the probability of admission to the program based on quantitative and 
verbal scores in the GRE. 
b. Is this a satisfactory model? If not, what alternative(s) do you suggest? 


Table 15.23 GRE Scores 


GRE Aptitude Test Scores Admitted to 
Graduate Program 


Student Number Quantitative, Q Verbal, V (Yes = 1, No = 0) 


1 760 550 1 
2 600 350 0 
3 720 320 0 
4 710 630 1 
5 530 430 0 
6 650 570 0 
7 800 500 1 
8 650 680 1 
9 520 660 0 
10 800 250 0 
11 670 480 0 
12 670 520 1 
13 780 710 1 


Source: Donald F. Morrison, Applied Linear Statistical Methods, Prentice-Hall, Inc., Englewood Cliffs, NJ, 
1983, p. 279 (adapted). 


15.16. To study the effectiveness of a price discount coupon on a six-pack of a soft drink, Douglas 
Montgomery and Elizabeth Peck collected the data shown in Table 15.24. A sample of 5,500 
consumers was randomly assigned to the eleven discount categories shown in the table, 500 per 
category. The response variable is whether or not consumers redeemed the coupon within one month. 


Table 15.24 Price of Soda with Discount Coupon 


Price Discount Sample Size Number of Coupons Redeemed 


X,¢ Ni ni 
5 500 100 
7 500 122 
9 500 147 

11 500 176 

13 500 211 

15 500 244 

17 500 277 

19 500 310 

21 500 343 

23 500 372 

25 500 391 


Source: Douglas C. Montgomery and Elizabeth A. Peck, Introduction to Linear Regression Analysis, 
John Wiley & Sons, New York, 1982, p. 243 (notation changed). 
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1517 


a. See if the logit model fits the data, treating the redemption rate as the dependent variable and price 
discount as the explanatory variable. 

b. See if the probit model does as well as the logit model. 

c. What is the predicted redemption rate if the price discount was 17 cents? 

d. Estimate the price discount for which 70 percent of the coupons will be redeemed. 

To find out who has a bank account (checking, savings, etc.) and who doesn’t, John Caskey and 

Andrew Peterson estimated a probit model for the years 1977 and 1989, using data on U.S. house- 

holds. The results are given in Table 15.25. The values of the slope coefficients given in the table 

measure the implied effect of a unit change in a regressor on the probability that a household has a 

bank account, these marginal effects being calculated at the mean values of the regressors included in 

the model. : l 

a. For 1977, what is the effect of marital status on ownership of a bank account? And for 1989? Do 
these results make economic sense? 

b. Why is the coefficient for the minority variable negative for both 1977 and 1989? 

c. How can you rationalize the negative sign for the number of children variable? 

d. What does the chi-square statistic given in the table suggest? (Hint: See Exercise 15.13.) 


Table 15.25 Probit Regressions where Dependent Variable is Ownership of a Deposit Account 


1977 Data 1989 Data 
Coefficients Implied Slope Coefficients Implied Slope 

Constant —1.06 —2.20 
33)" (6.8)* 

Income (thousands 1991 $) 0.030 0.002 0.025 0.002 
(6.9) (6.8) 

Married 0.127 0.008 0.235 ~ 0.023 
(0.8) (1.7) 

Number of children —0.131 —0.009 —0.084 —0.008 
(3.6) (2.0) 

Age of head of household (HH) 0.006 0.0004 0.021 0.002 
(1.7) (6.3) 

Education of HH 0.121 0.008 0.128 0.012 
(7.4) C7) 

Male HH —0.078 —0.005 —0.144 —0.011 
(0.5) (0.9) 

Minority —0.750 —0.050 —0.600 —0.058 
(6.8) (6.5) 

Employed 0.186 0.012 0.402 0.039 
(1.6) (3.6) 

Homeowner 0.520 0.035 0.522 0.051 
(4.7) (5.3) 

Log likelihood —430.7 —526.0 

Chi-square statistic 408 602 

(Ho: All coefficients except 
constant equal zero) 

Number of observations 2,025 2,091 

Percentage in sample 

with correct predictions 91 90 


*Numbers in parentheses are ¢ statistics. 


Source: John P. Caskey and Andrew Peterson, “Who Has a Bank Account and Who Doesn't: 1977 and 1989,” Research Working Paper 93-10, Federal 


Reserve Bank of Kansas City, October 1993. 
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15.18. Monte Carlo study. As an aid to understanding the probit model, William Becker and Donald Waldman 
assumed the following:” 


E(Y |X) = -1 +3% 


Then, letting Y,=—1 + 3X + £;, where £;is assumed standard normal (i.e., zero mean and unit variance), 
they generated a sample of 35 observations as shown in Table 15.26. 


a. 


b. 


From the data on Y and X given in this table, can you estimate an LPM? Remember that the true 
E(Y | X) = -1 + 3X. 

Given X = 0.48, estimate E(Y | X = 0.48) and compare it with the true E(Y | X = 0.48). Note 
X = 0.48. 


. Using the data on Y“ and X given in Table 15.26, estimate a probit model. You may use any statis- 


tical package you want. The authors’ estimated probit model is the following: 


A 


Y* = —0.969 2764X; 


Find out the PO = 1 | X = 0.48), that is, P(Y, > 01 X = 0.48). See if your answer agrees with the 
authors’ answer of 0.64. 


. The sample standard deviation of the X values given in Table 15.26 is 0.31. What is the predicted 


change in probability if X is one standard deviation above the mean value, that is, what is 
P(Y" = 1 | X = 0.79)? The authors’ answer is 0.25. 


Table 15.26 Hypothetical Data Set Generated by the Model Y= -1+ 3X + e 


and Y*=11f Y>0 


K Le X Y ja X 
—0.3786 0 O27. —0.3753 0 0.56 
1.1974 1 0.59 1.9701 1 0.61 
—0.4648 0 0.14 —0.4054 0 0.17 
1.1400 1 0.81 : 2.4416 1 0.89 
0.3188 1 0.35 0.8150 1 0.65 
2.2013 1 1.00 —0.1223 0 0.23 
2.4473 1 0.80 0.1428 1 0.26 
0.1153 1 0.40 —0.6681 0 0.64 
0.4110 1 0.07 1.8286 1 0.67 
2.6950 1 0.87 —0.6459 0 0.26 
2.2009 1 0.98 2.9784 1 0.63 
0.6389 1 0.28 —2.3326 0 0.09 
4.3192 1 0.99 0.8056 1 0.54 
—1.9906 (0) 0.04 —0.8983 0 0.74 
—0.9021 0 0.37 —0.2355 0 0.17 
0.9433 1 0.94 1.1429 1 0.57 
—3.2235 0 0.04 —0.2965 0 0.18 
0.1690 1 0.07 


Source: William E. Becker and Donald M. Waldman, “A Graphical Interpretation of Probit Coefficients,” 
Journal of Economic Education, Fall 1989, Table 1, p. 373. 


*William E. Becker and Donald M. Waldman, “A Graphical Interpretation of Probit Coefficients,” Journal of Economic Educa- 
tion, vol. 20, no. 4, Fall 1989, pp. 371-378. 
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15.19. Table 15.27 on the textbook website gives data for 2,000 women regarding work (1 = a woman works, 
0 = otherwise), age, marital status (1 = married, 0 = otherwise), number of children, and education 
(number of years of schooling). Out of a total of 2,000 women, 657 were recorded as not being wage 
earners. 

a. Using these data, estimate the linear probability model (LPM). 

b. Using the same data, estimate a logit model and obtain the marginal effects of the various variables. 
c. Repeat (b) for the probit model. 

d. Which model would you choose? Why? 

15.20. For the smokers example discussed in the text (see Section 15.10) download the data from the textbook 
website in Table 15.28. See if the product of education and income (i.e., the interaction effect) has any 
effect on the probability of becoming a smoker. 

15.21. Download the data set Benign, which is Table 15.29, from the textbook website. The variable cancer 
is a dummy variable, where 1 = had breast cancer and 0 = did not have breast cancer. Using the 
variables age (= age of subject), HIGD (= highest grade completed in school), CHK (= 0 if subject 
did not undergo regular medical checkups and = 1 if subject did undergo regular checkups), AGPI 
(= age at first pregnancy), miscarriages (= number of miscarriages), and weight (= weight of subject), 
perform a logistic regression to conclude if these variables are statistically useful for predicting 
whether a woman will contract breast cancer or not. 


Key to Multiple Choice Questions 


1. (a) 2E) 3e) 4. (b) 5. (a) 6. (b) 7. (a) 8. (b) oE) 
10. (b) 11. (a) 12. (a) 13. (a) 14. (b) 15. (d) 16. (b) 17. (b) 18. (c) 
19. (d) 20. (d) 26) T 222 (c) 24. (a) 25. (b) 


Appendix 15A 


I5A.I Maximum Likelihood Estimation of the Logit and 
Probit Models for Individual (Ungrouped) Datat 


As in the text, assume that we are interested in estimating the probability that an individual owns a house, given the 
individual’s income X. We assume that this probability can be expressed by the logistic function (15.5.2), which is repro- 
duced below for convenience. 


1 
f | ESA (1) 


‘Data are provided on 50 women who were diagnosed as having benign breast disease and 150 age- matched controls, 
with three controls per case. Trained interviewers administered a standardized structured questionnaire to collect informa- 
tion from each subject (see Pastides, et al. [1983] and Pastides, et al. [1985]). 


‘The following discussion leans heavily on John Neter, Michael H. Kutner, Christopher J. Nachsteim, and William Wasser- 
man, Applied Linear Statistical Models, 4th ed., Irwin, 1996, pp. 573-574. 
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We do not actually observe P;, but only observe the outcome Y = 1, if an individual owns a house, and Y = 0, if the 
individual does not own a house. 


Since each Y, is a Bernoulli random variable, we can write 
Pr(¥; = 1) = P; (2) 
Pr(¥; = 0) = (1 — Pi) (3) 
Suppose we have a random sample of n observations. Letting f(Y;) denote the probability that Y, = 1 or 0, the joint proba- 


bility of observing the n Y values, i.e., AY;, Y>, ..., Y,) is given as: 


I? page Mo le a” (4) 
1 l 
where IT is the product operator. Note that we can write the joint probability density function as a product of individual 
density functions because each Y, is drawn independently and each Y, has the same (logistic) density function. The joint 
probability given in Eq. (4) is known as the likelihood function (LF). 
Equation (4) is a little awkward to manipulate. But if we take its natural logarithm, we obtain what is called the log 
likelihood function (LLF): 


In f(Yi, Yo, -s Yn) = $ [Y ln P, +1 — Yi) n(1 — P) 
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= $ [Yhn P; — Y; In(1 — P) +n (1 — PF] (5) 
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From Eq. (1) it is easy to verify that 
1 


C EE ies (6) 
as well as 
m(— = 614+ 6x, 
ae a B2X; (7) 
Using Eqs. (6) and (7), we can write the LLF (5) as: 
In f(Yi, Yo, ---, Yn) = > ¥i(Bi + BX) — Doin [1 ee elisa) (8) 
i 1 


As you can see from Eq. (8), the log likelihood function is a function of the parameters 8, and £, since the X, are known. 

In ML our objective is to maximize the LF (or LLP), that is, to obtain the values of the unknown parameters in such a 
manner that the probability of observing the given Y’s is as high (maximum) as possible. For this purpose, we differentiate 
Eq. (8) partially with respect to each unknown, set the resulting expressions to zero, and solve the resulting expressions. 
One can then apply the second-order condition of maximization to verify that the values of the parameters we have 
obtained do in fact maximize the LF. 

So, you have to differentiate Eq. (8) with respect to 8, and B, and proceed as indicated.As you will quickly realize, 
the resulting expressions become highly nonlinear in the parameters and no explicit solutions can be obtained. That is 
why we will have to use one of the methods of nonlinear estimation discussed in the previous chapter to obtain numerical 
solutions. Once the numerical values of 6, and $, are obtained, we can easily estimate Eq. (1). 

The ML procedure for the probit model is similar to that for the logit model, except that in Eq. (1) we use the normal 
CDF rather than the logistic CDF. The resulting expression becomes rather complicated, but the general idea is the same. 
So, we will not pursue it any further. 


CHAPTER 1 6 
Panel Data 
Regression Models 


In Chapter 1 we discussed briefly the types of data that are generally available for empirical analysis, namely, 
time series, cross section, and panel. In time series data we observe the values of one or more variables 
over a period of time (e.g., GDP for several quarters or years). In cross-section data, values of one or more 
variables are collected for several sample units, or subjects, at the same point in time (e.g., crime rates for 25 
states in India for a given year). In panel data the same cross-sectional unit (say a family or a firm or a state) 
is surveyed over time. In short, panel data have space as well as time dimensions. 

We have already seen an example of this in Table 1.1, which gives data on labour productivity and wages 
for 27 states of India for the years 2007—08 and 2008—09. For any given year, the data on labour productivity 
and wages represent a cross-sectional sample. For any given state, there are two time series observations on 
labour productivity and wages. Thus, we have in all 59 (pooled) observations on labour productivity and 
wages. l l 

Another example of panel data was given in Table 1.2, which gives data on investment, value of the firm, 
and capital stock for four US companies for the period 1935-1954. The data for each company over the 
period 1935-1 954 constitute time series data, with 20 observations; data, for all four companies for a given 
year is an example of cross-section data, with only four observations; and data for all the companies for all 
the years is an example of panel data, with a total of 80 observations. 

There are other names for panel data, such as pooled data (pooling of time series and cross-sectional 
observations), combination of time series and cross-section data, micropanel data, longitudinal data (a 
study over time of a variable or group of subjects), event history analysis (studying the movement over time 
of subjects through successive states or conditions), and cohort analysis (e.g., following the career path of 
1965 graduates of a business school). Although there are subtle variations, all these names essentially connote 
movement over time of cross-sectional units. We will therefore use the term panel data in a generic sense 
to include one or more of these terms. And we will call regression models based on such data panel data 
regression models. 

Panel data are now being used increasingly in economic research. Some of the well-known panel data sets 
are: 

1. The Panel Study of Income Dynamics (PSID) conducted by the Institute of Social Research at the 

University of Michigan. Started in 1968, each year the Institute collects data on some 5,000 families 
about various socioeconomic and demographic variables. 
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2. The Bureau of the Census of the Department of Commerce conducts a survey similar to PSID, called 
the Survey of Income and Program Participation (SIPP). Four times a year respondents are inter- 
viewed about their economic condition. 

3. The German Socio-Economic Panel (GESOEP) studied 1,761 individuals every year between 
1984 and 2002. Information on year of birth, gender, life satisfaction, marital status, individual labor 
earnings, and annual hours of work was collected for each individual for the period 1984 to 2002. 

There are also many other surveys that are conducted by various governmental agencies, such as: 


Household, Income and Labor Dynamics in Australia Survey (HILDA) 
British Household Panel Survey (BHPS) 
Korean Labor and Income Panel Study (KLIPS) 


At the outset a warning is in order: The topic of panel data regressions is vast, and some of the mathe- 
matics and statistics involved are quite complicated. We only hope to touch on some of the essentials of the 
panel data regression models, leaving the details for the references.! But be forewarned that some of these 
references are highly technical. Fortunately, user-friendly software packages such as LIMDEP, PC-GIVE, 
SAS, STATA, SHAZAM, and EViews, among others, have made the task of actually implementing panel data 
regressions quite easy. 


16.1 Why Panel Data? 


What are the advantages of panel data over cross-section or time series data? Baltagi lists the following 
advantages of panel data: 


1. Since panel data relate to individuals, firms, states, countries, etc., over time, there is bound to be 
heterogeneity in these units. The techniques of panel data estimation can take such heterogeneity 
explicitly into account by allowing for subject-specific variables, as we shall show shortly. We use the 
term subject in a generic sense to include microunits such as individuals, firms, states, and countries. 

2. By combining time series of cross-section observations, panel data gives “more informative data, more 
variability, less collinearity among variables, more degrees of freedom and more efficiency.” 

3. By studying the repeated cross-section of observations, panel data are better suited to study the 
dynamics of change. Spells of unemployment, job turnover, and labor mobility are better studied with 
panel data. 

4. Panel data can better detect and measure effects that simply cannot be observed in pure cross-section 
or pure time series data. For example, the effects of minimum wage laws on employment and earnings 
can be better studied if we include successive waves of minimum wage increases in the federal and/or 
state minimum wages. anii ©- 

5. Panel data enables us to study more complicated behavioral models. For example, phenomena such as 
economies of scale and technological change can be better handled by panel data than by pure cross- 


section or pure time series data. 


1Some of the references are G. Chamberlain, “Panel Data,” in Handbook of Econometrics, vol. Il; Z. Griliches and M. D. Intrili- 
gator, eds., North-Holland Publishers, 1984, Chapter 22; C. Hsiao, Analysis of Panel Data, Cambridge University Press, 1986; 
G. G. Judge, R. C. Hill, W. E. Griffiths, H. Lutkepohl, and T. C. Lee, Introduction to the Theory and Practice of Econometrics, 2d 
ed., John Wiley & Sons, New York, 1985, Chapter 11; W. H. Greene, Econometric Analysis, 6th ed., Prentice-Hall, Englewood 
Cliffs, NJ, 2008, Chapter 9; Badi H. Baltagi, Econometric Analysis of Panel Data, John Wiley and Sons, New York, 1995; and 
J. M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass., 1999. For a detailed 
treatment of the subject with empirical applications, see Edward W. Frees, Longitudinal and Panel Data: Analysis and Applica- 
tions in the Social Sciences, Cambridge University Press, New York, 2004. 


*Baltagi, op. cit., pp. 3-6. 
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6. By making data available for several thousand units, panel data can minimize the bias that might result 
if we aggregate individuals or firms into broad aggregates. 


In short, panel data can enrich empirical analysis in ways that may not be possible if we use only cross- 
section or time series data. This is not to suggest that there are no problems with panel data modeling. We 
will discuss them after we cover some theory and discuss some examples. 


16.2 Panel Data: An Illustrative Example 


To set the stage, let us consider a Concrete example. Consider the data given as Table 16.1 on the textbook 
website, which were originally collected by Professor Moshe Kim and are reproduced from William Greene.’ 
The data analyzes the costs of six airline firms for the period 1970-1984, for a total of 90 panel data observa- 
tions. 

The variables are defined as: J = airline id; T = year id; Q = output, in revenue passenger miles, an index 
number; C = total cost, in $1,000; PF = fuel price; and LF = load factor, the average capacity utilization of 
the fleet. 

Suppose we are interested in finding out how total cost (C) behaves in relation to output (Q), fuel price 
(PF), and load factor (LF). In short, we wish to estimate an airline cost function. 

How do we go about estimating this function? Of course, we can estimate the cost function for each airline 
using the data for 1970-1984 (i.e., a time series regression). This can be accomplished with the usual ordinary 
least squares (OLS) procedure. We will have in all six cost functions, one for each airline. But then we neglect 
the information about the other airlines which operate in the same (regulatory) environment. 

We can also estimate a cross-section cost function (i.e., a cross-section regression). We will have in all 15 
cross-section regressions, one for each year. But this would not make much sense in the present context, for 
we have only six observations per year and there are three explanatory variables (plus the intercept term); 
we will have very few degrees of freedom to do a meaningful analysis. Also, we will not “exploit” the panel 
nature of our data. 

Incidentally, the panel data in our example is called a balanced panel; a panel is said to be balanced 
if each subject (firm, individuals, etc.) has the same number of observations. If each entity has a different 
number of observations, then we have an unbalanced panel. For most of this chapter, we will deal with 
balanced panels. In the pane] data literature you will also come across the terms short panel and long panel. 
In a short panel the number of cross-sectional subjects, N, is greater than the number of tinye periods, T. In a 
long panel, it is T that is greater than N. As we discuss later, the estimating techniques can depend on whether 
we have a short panel or a long one. 

What, then, are the options? There are four possibilities: 


1. Pooled OLS model. We simply pool all 90 observations and estimate a “grand” regression, neglecting 
the cross-section and time series nature of our data. 

2. The fixed effects least squares dummy variable (LSDV) model. Here we pool all 90 observations, but 
allow each cross-section unit (i.e., airline in our example) to have its own (intercept) dummy variable. 

3. The fixed effects within-group model. Here also we pool all 90 observations, but for each airline we 
express each variable as a deviation from its mean value and then estimate an OLS regression on such 
mean-corrected or “de-meaned” values. 


3William H. Greene, Econometric Analysis, 6th ed., 2008. Data are located at http://pages.stern.nyu.edu/~wgreen/Text/ 
econometricanalysis. htm. 
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4. The random effects model (REM). Unlike the LSDV model, in which we allow each airline to have 
its own (fixed) intercept value, we assume that the intercept values are a random drawing from a much 
bigger population of airlines. 


We now discuss each of these methods using the data given in Table 16.1. (See textbook website.) 


16.3 Pooled OLS Regression or Constant Coefficients Model 


Consider the following model: 


Cit = Bi + BoOit + P3 P Fir + Bal Fy + tit (16.3.1) 
(— || ro 
E 


where i is ith subject and r is the time period for the variables we defined previously. We have chosen the 
linear cost function for illustrative purposes, but in Exercise 16.10 you are asked to estimate a log—linear, or 
double-log function, in which case the slope coefficients will give the elasticity estimates. 

Notice that we have pooled together all 90 observations, but note that we are assuming the regression 
coefficients are the same for all the airlines. That is, there is no distinction between the airlines—one airline 
is as good as the other, an assumption that may be difficult to maintain. 

It is assumed that the explanatory variables are nonstochastic. If they are stochastic, they are uncorrelated 
with the error term. Sometimes it is assumed that the explanatory variables are strictly exogenous. A variable 
is said to be strictly exogenous if it does not depend on current, past, and future values of the error term u;, 

It is also assumed that the error term is u; ~ iid(0, 0), that is, it is independently and identically 
distributed with zero mean and constant variance. For the purpose of hypothesis testing, it may be assumed 
that the error term is also normally distributed. Notice the double-subscripted notation in Eq. (16.3.1), which 
should be self-explanatory. 

Let us first present the results of the estimated equation (16.3.1) and then discuss some of the problems 
with this model. The regression results based on EViews, Version 6 are presented in Table 16.2. 


Table 16.2 


Dependent Variable: C 
Method: Least Squares 
Included observations: 90 


Coefficient std. |aeietohe t Statistic Prob. 
C (intercept) 1585595 360592.7 3.212930 0.0018 
Q 2026114. 61806.95 B12} TSE! 0.0000 
PF 1.225348 Oe LOsizZ22 11.81380 0.0000 

LF -3065753. OE IAT oS ~4.402747 aa 0.0000 _ 
R-squared 0.946093 Mean dependent var. 1122524. 
Adjusted R-squared 0.944213 S.D. dependent var. 1192075. 
S.E. of regression 281559.5 F-statistic 503.1176 
Sum squared resid. 6.82E+12 Prob. (F-statistic) 0.000000 


Durbin-Watson 0.434162 
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If you examine the results of the pooled regression and apply the conventional criteria, you will see that 
all the regression coefficients are not only highly statistically significant but are also in accord with prior 
expectations and that the R? value is very high. The only “fly in the ointment” is that the estimated Durbin- 
Watson statistic is quite low, suggesting that perhaps there is autocorrelation and/or spatial correlation in the 
data. Of course, as we know, a low Durbin—Watson could also be due to specification errors. 

The major problem with this model is that it does not distinguish between the various airlines nor does 
it tell us whether the response of total cost to the explanatory variables over time is the same for all the 
airlines. In other words, by lumping together different airlines at different times we camouflage the heteroge- 
neity (individuality or uniqueness) that may exist among the airlines. Another way of stating this is that the 
individuality of each subject is subsumed in the disturbance term ü; AS a consequence, it is quite possible 
that the error term may be correlated with some of the regressors included in the model. If that is the case, the 
estimated coefficients in Eq. (16.3.1) may be biased as well as inconsistent. Recall that one of the important 
assumptions of the classical linear regression model is that there is no correlation between the regressors and 
the disturbance or error term. 

To see how the error term may be correlated with the regressors, let us consider the following revision of 
model (16.3.1): 


Cir = Bi + BoP Fie + Bs L Fit + BaMie + tit (16.3.2) 


where the additional variable M = management philosophy or management quality. Of the variables included 
in Eq. (16.3.2), only the variable M is time-invariant (or time-constant) because it varies among subjects 
but is constant over time for a given subject (airline). 

Although it is time-invariant, the variable M is not directly observable and therefore we cannot measure its 
contribution to the cost function. We can, however, do this indirectly if we write Eq. (16.3.2) as 


Cit = By + BoP Fit + Bs L Fir + Qi + tit (16.3.3) 


where a, called the unobserved, or heterogeneity, effect, reflects the impact of M on cost. Note that for 
simplicity we have shown only the unobserved effect of M on cost, but in reality there may be more such 
unobserved effects, for example, the nature of ownership (privately owned or publicly owned), whether it is 
a minority-owned company, whether the CEO is a man or a woman, etc. Although such variables may differ 
among the subjects (airlines), they will probably remain the same for any given subject over the sample 
period. 

Since q, is not directly observable, why not consider it random and include it in the €rror term u,,, and 
thereby consider the composite error term v; = a; + u;,? We now write Eq. (16.3.3) as: 


Cit = Bi + BoP Fit + Ba L Fit + vit sO (16.3.4) 


But if the a; term included in the error term v, is correlated with any of the regressors in Eq. (16.3.4), we 
have a violation of one of the key assumptions of the classical linear regression model—namely, that the error 
term is not correlated with the regressors. As we know in this situation, the OLS estimates are not only biased 
but they are also inconsistent. 

There is a real possibility that the unobservable a, is correlated with one or more of the regressors. For 
example, the management of one airline may be astute enough to buy future contracts of the fuel price to 
avoid severe price fluctuations. This will have the effect of lowering the cost of airline services. As a result 
of this correlation, it can be shown that cov (vir, Vis) = GA t #8, which is non-zero, and therefore, the 
(unobserved) heterogeneity induces autocorrelation and we will have to pay attention to it. We will show 
later how this problem can be handled. 


Panel Data Regression Models 627 


The question, therefore, is how we account for the unobservable, or heterogeneity, effect(s) so that we can 
obtain consistent and/or efficient estimates ot the parameters of the variables of prime interest, which are 
output, fuel price, and load factor in our case. Our prime interest may not be in obtaining the impact of the 
unobservable variables because they remain the same for a given subject. That is why such unobservable, or 


heterogeneity, effects are called nuisance parameters. How then do we proceed? It is to this question we 
now turn. 


16.4 The Fixed Effect Least-Squares Dummy Variable (LSDV) Model 


The least-squares dummy variable (LSDV) model allows for heterogeneity among subjects by allowing each 
entity to have its own intercept value. as shown in model (16.4.1). Again, we continue with our airlines 
example. 


Cit = Bui + BoQit + B3P Fir + Bal Fit + uit (16.4.1) 
i | ao) 
lee eS 


Notice that we have put the subscript i on the intercept term to suggest that the intercepts of the six airlines 
may be different. The difference may be due to special features of each airline, such as managerial style, 
managerial philosophy, or the type of market each airline is serving. 

In the literature, model (16.4.1) is known as the fixed effects (regression) model (FEM). The term “fixed 
effects” is due to the fact that, although the intercept may differ across subjects (here the six airlines), each 
entity's intercept does not vary over time, that is, it is time-invariant. Notice that if we were to write the 
intercept as B}; it would suggest that the intercept of each entity or individual is time-variant. It may be 
noted that the FEM given in Eq. (16.4.1) assumes that the (slope) coefficients of the regressors do not vary 
across individuals or over time. 

Before proceeding further, it may be useful to visualize the difference between the pooled regression 
model and the LSDV model. For simplicity assume that we want to regress total cost on output only. In Figure 
16.1 we show this cost function estimated for two airline companies separately, as well as the cost function 
if we pool the data for the two companies; this is equivalent to neglecting the fixed effects.* You can see from 
Figure 16.1 how the pooled regression can bias the slope estimate. i $ 

How do we actually allow for the (fixed effect) intercept to vary among the airlines? We can easily do this 
by using the dummy variable technique, particularly the differential intercept dummy technique, which we 
learned in Chapter 9. Now we write Eq. (16.4.1) as: 


Cit = Q1 + 2 D2; + 3 D3; + 14 Dai + 05 Ds; + 6 Do; 
+ BoQir + BsPFit + Bal Fit + uit (16.4.2) 


where D; = | for airline 2, 0 otherwise; D}; = 1 for airline 3, 0 otherwise; and so on. Notice that since we have 
six airlines, we have introduced only five dummy variables to avoid falling into the dummy-variable trap 
(i.e., the situation of perfect collinearity). Here we are treating airline 1 as the base, or reference, category. Of 
course, you can choose any airline as the reference point. As a result, the intercept a, is the intercept value of 
airline 1 and the other æ coefficients represent by how much the intercept values of the other airlines differ 
from the intercept value of the first airline. Thus, a, tells by how much the intercept value of the second airline 
differs from @,. The sum (a, + a) gives the actual value of the intercept for airline 2. The intercept values 


‘Adapted from the unpublished notes of Alan Duncan. 
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E(Y;,|X;,) = 2 + BX, 


3 ° _ Biased slope when 
S pe fixed effects are ignored 
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Output 


Figure 16.1 Bias from ignoring fixed effects. 


of the other airlines can be computed similarly. Keep in mind that if you want to introduce a dummy for each 
airline, you will have to drop the (common) intercept; otherwise, you will fall into the dummy-variable trap. 
The results of the model (16.4.2) for our data are presented in Table 16.3. 


Table 16.3 


Dependent Variable: TC 
Method: Least Squares 
Sample: 1-90 

Included observations: 90 


Coefficient Stas aiscets t Statistic Prob. 

C (=a) -131236.0 35O al —Qe 37 429 0.7093, 
Q 32 19029A 171394. T TITISAN 0.0000 
PF OFTST 07097319 7.943676 0.0000 
LF -3797368. 613773-1 -6.186924 0.0000 
DUM2 601733-2 100895.7 57963913 0.0000 
DUM3 IOT TEN: 186171.0 7.182538 0.0000 
DUM4 4777592. 213162.9 8.339126 0.0000 
DUM5 eZ G25 on 231229.7 7.906651 0.0000 
DUM6 1706474. 228300.9 7.474672 0.0000 
R-squared On 971642 Mean Semoia var. MBAS: 
Adjusted R-squared 0.968841 S.D. dependent var. IS 0) 7/'5) 
S.E. of regression 210422.8 F-statistics 346.9188 
Sum squared resid. 3.59E+12 Prob. (F-statistic) 0.000000 


Log likelihood =1226. 082 Durbin-Watson stat. 0.693288 
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The first thing to notice about these results is that all the differential intercept coefficients are individually 
highly statistically significant, suggesting that perhaps the six airlines are heterogeneous and, therefore, the 
pooled regression results given in Table 16.2 may be suspect. The values of the slope coefficients given 
in Tables 16.2 and 16.3 are also different, again casting some doubt on the results given in Table 16.2. It 
seems model (16.4.1) is better than model (16.3.1). In passing, note that OLS applied to a fixed effect model 
produces estimators that are called fixed effect estimators. 

We can provide a formal test of the two models. In relation to model (16.4.1), model (16.3.1) is a restricted 
model in that it imposes a common intercept for all the airlines. Therefore, we can use the restricted F test 
discussed in Chapter 8. Using formula (8.6.10), the reader can check that in the present case the F value is: 


__ (0.971642 — 0.946093)/5 
(1 —0.971642)/81 


ja X 14.99 
Note: The restricted and unrestricted R* values are obtained from Tables 16.1 and 16.2. Also note that the 
number of restrictions is 5 (why?). 

The null hypothesis here is that all the differential intercepts are equal to zero. The computed F value for 
5 numerator and 81 denominator df is highly statistically significant. Therefore, we reject the null hypothesis 
that all the (differential) intercepts are zero. If the F value were not statistically significant, we would have 
concluded that there is no difference in the intercepts of the six airlines. In this case, we would have pooled 
all 90 of the observations, as we did in the pooled regression given in Table 16.2. 

Model (16.4.1) is known as a one-way fixed effects model because we have allowed the intercepts to 
differ between airlines. But we can also allow for time effect if we believe that the cost function changes over 
time because of factors such as technological changes, changes in government regulation and/or tax policies, 
and other such effects. Such a time effect can be easily accounted for if we introduce time dummies, one for 
each year from 1970 to 1984. Since we have data for 15 years, we can introduce 14 time dummies (why?) and 
extend model (16.4.1) by adding these variables. If we do that, the model that emerges is called a two-way 
fixed effects model because we have allowed for both individual and time effects. 

In the present example, if we add the time dummies, we will have in all 23 coefficients to estimate—the 
common intercept, five airlines dummies, 14 time dummies, and three slope coefficients. As you can see, we 
will consume several degrees of freedom. Furthermore, if we decide to allow the slope coefficients to differ 
among the companies, we can interact the five firm (airline) dummies with each of the three explanatory 
variables and introduce differential slope dummy coefficients. Then we will have to estimate 15 additional 
coefficients (five dummies interacted with three explanatory variables). As if this is not enough, if we interact 
the 14 time dummies with the three explanatory variables, we will have in all 42 additional coefficients to 
estimate. As you can see, we will not have any degrees of freedom left. 


A Caution in the Use of the Fixed Effect LSDV Model 


As the preceding discussion suggests, the LSDV model has several problems that need to be borne in mind: 

First, if you introduce too many dummy variables, you will run up against the degrees of freedom problem. 
That is, you will lack enough observations to do a meaningful statistical analysis. Second, with many dummy 
variables in the model, both individual and interactive or multiplicative, there is always the possibility of 
multicollinearity, which might make precise estimation of one or more parameters difficult. 

Third, in some situations the LSDV may not be able to identify the impact of time-invariant variables. 
Suppose we want to estimate a wage function for a group of workers using panel data. Besides wage, a wage 
function may include age, experience, and education as explanatory variables. Suppose we also decide to add 
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sex, color, and ethnicity as additional variables in the model. Since these variables will not change over time 
for an individual subject, the LSDV approach may not be able to identify the impact of such time-invariant 
variables on wages. To put it differently, the subject-specific intercepts absorb all heterogeneity that may exist 
in the dependent and explanatory variables. Incidentally, the time-invariant variables are sometimes called 
nuisance variables or lurking variables. 

Fourth, we have to think carefully about the error term u; The results we have presented in Eqs. (16.3.1) 
and (16.4.1) are based on the assumption that the error term follows the classical assumptions, namely, u;, ~ 
_ N(O, a”). Since the index i refers to cross-section observations and f to time series observations, the classical 
assumption for u; may have to be modified. There are several possibilities, including: 


1. We can assume that the error variance is the same for all cross-section units or we can assume that the 
error variance is heteroscedastic.> 

2. For each entity, we can assume that there is no autocorrelation over time. Thus, in our illustrative 
example, we can assume that the error term of the cost function for airline #1 is non-autocorrelated, or 
we can assume that it is autocorrelated, say, of the AR(1) type. 

3. Fora given time, it is possible that the error term for airline #1 is correlated with the error term for, say, 
airline #2.° Or we can assume that there is no such correlation. 


There are also other combinations and permutations of the error term. As you will quickly realize, allowing 
one or more of these possibilities will make the analysis that much more complicated. (Space and mathe- 
matical demands preclude us from considering all the possibilities. The references in footnote | discuss some 
of these topics.) Some of these problems may be alleviated, however, if we consider the alternatives discussed 
in the next two sections. 


16.5 The Fixed-Effect Within-Group (WG) Estimator 


One way to estimate a pooled regression is to eliminate the fixed effect, B}, by expressing the values of 
the dependent and explanatory variables for each airline as deviations from their respective mean values. 
Thus, for airline #1 we will obtain the sample mean values of TC. Q, PF, and LF, (TC. O, PF, and LF, 
respectively) and subtract them from the individual values of these variables. The resulting values are called 
“de-meaned” or mean-corrected values. We do this for each airline and then pool all the (90) mean-corrected 
values and run an OLS regression. 

Letting tc;,, qip Pf» and If, represent the mean-corrected values, we now run the regression: 


tCit = Boqir + Bspfir + Bal fir + uit (16.5.1) 


where i = 1, 2, ...,6, and t= 1, 2, ..., 15. Note that Eq. (16.5.1) does not have an intercept term (why?). 

Returning to our example, we obtain the results in Table 16.4. Note: The prefix DM means that the values 
are mean-corrected or expressed as deviations from their sample means. 

Note the difference between the pooled regression given in Table 16.2 and the pooled regression in Table 
16.4. The former simply ignores the heterogeneity among the six airlines, whereas the latter takes it into 
account, not by the dummy variable method, but by eliminating it by differencing sample observations around 
their sample means. The difference between the two is obvious, as shown in Figure 16.2. 


SSTATA provides heteroscedasticity-corrected standard errors in the panel data regression models. 


®This leads to the so-called seemingly unrelated regression (SURE) model, originally proposed by Arnold Zellner. See 


A. Zellner, “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias,” Journal of 
the American Statistical Association, vol. 57, 1962, pp. 348-368. 
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It can be shown that the WG estimator produces consistent estimates of the slope coefficients, whereas the 
ordinary pooled regression may not. It should be added, however, that WG estimators, although consistent, 
are inefficient (i.e., have larger variances) compared to the ordinary pooled regression results.’ Observe that 
the slope coefficients of the Q, PF, and LF are identical in Tables 16.3 and 16.4. This is because mathemati- 


cally the two models are identical. Incidentally, the regression coefficients estimated by the WG method are 
called WG estimators. 


Table 16.4 


Dependent Variable: DMTC 
Method: Least Squares 
Sample: 1-90 

Included observations: 90 


Std. Hisror E Statistice Prob. 


Coefficient 


DMQ 389023). 165339.8 ` 20.07396 0.0000 
DMPF Omer sora 0.093903 Se2zeres 0 0.0000 
DMLF -3797368. 592281085 -6.411976 0.0000 
R-squared 0.929366 Mean dependent var. 2.59E-11 
Adjusted R-squared 0.927743 S.D. dependent var. 755325.8 
S.E. of regression 203037.2 Durbin-Watson stat. 0.693287 
Sum squared resid. 3.59E+12 
ra 
s z =" 
e . e 1 
S a C] a 
Gn He * | 
ae at oe a 
g À Re 2 At 
2 An GO e. > 
e i e G] 
LJ ans 2 + A 
~ ee ees 
S all Oa 
= a2 ° - 
E r “a to 8 — ae 6 
= È Ta at 


Figure 16.2 The within-groups estimator. 


Source: Alan Duncan, “Cross-Section and Panel Data Econometrics,” unpublished lecture notes (adapted). 


7The reason for this is that when we express variables as deviations from their mean values, the variation in these mean- 
corrected values will be much smaller than the variation in the original values of the variables. In that case, the variation in 
the disturbance term u, may be relatively large, thus leading to higher standard errors of the estimated coefficients. 
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One disadvantage of the WG estimator can be explained with the following wage regression model: 


Wi; = Bi; + BoExperience,, + B3;Age;, + B4Gender;, + BsEducation,, + BeRacej, (16.5.2) 


In this wage function, variables such as gender, education, and race are time-invariant. If we use the WG 
estimators, these time-invariant variables will be wiped out (because of differencing). As a result, we will 
not know how wage reacts to these time-invariant variables.* But this is the price we have to pay to avoid the 
correlation between the error term (a; included in v, and the explanatory variables. 

Another disadvantage of the WG estimator is that, “... it may distort the parameter values and can certainly 
remove any long run effects”? In general, when we difference a variable, we remove the long-run component 
from that variable. What is left is the short-run value of that variable. We will discuss this further when we 
discuss time series econometrics later in the book. 

In using LSDV we obtained direct estimates of the intercepts for each airline. How can we obtain the 
estimates of the intercepts using the WG method? For the airlines example, they are obtained as follows: 

a; = C; — R20; — PsP F; — PaL F (16.5.3) 
where bars over the variables denote the sample mean values of the variables for the ith airline. 

That is, we obtain the intercept value of the ith airline by subtracting from the mean value of the dependent 
variable the mean values of the explanatory variables for that airline times the estimated slope coefficients 
from the WG estimators. Note that the estimated slope coefficients remain the same for all of the airlines, 
as shown in Table 16.4. It may be noted that the intercept estimated in Eq. (16.5.3) is similar to the intercept 
we estimate in the standard linear regression model, which can be see from Eq. (7.4.2 1). We leave it for the 
reader to find the intercepts of the six airlines in the manner shown and verify that they are the same as the 
intercept values derived in Table 16.3, save for the rounding errors. 

It may be noted that the estimated intercept of each airline represents the subject-specific characteristics 
of each airline, but we will not be able to identify these characteristics individually. Thus, the a, intercept 
for airline #1 represents the management philosophy of that airline, the composition of its board of directors, 
the personality of the CEO, the gender of the CEO, etc. All these heterogeneity characteristics are subsumed 
in the intercept value. As we will see later, such characteristics can be included in the random effects model. 

In passing, we note that an alternative to the WG estimator is the first-difference method. In the WG 
method, we express each variable as a deviation from that variable’s mean value. In the first-difference 
method, for each subject we take successive differences of the variables. Thus, for airline #1 we subtract 
the first observation of TC from the second observation of TC, the second observation of TC from the third 
observation of TC, and so on. We do this for each of the remaining variables and repeat this process for 
the remaining five airlines. After this process we have only 14 observations for each airline, since the first 
observation has no previous value. As a result, we now have 84 observations instead of the original 90 obser- 
vations. We then regress the first-differenced values of the TC variable on the first-differenced values of the 
explanatory variables as follows: 


ATC, = BoAQit + Ba APF + Ba AL Fi + (tit = ay) 
1 O ; 
T= 1,2,..., 84 

where A = (TC, — TC; 1). As noted in Chapter 11, A is called the first difference operator. 1? 


(16.5.4) 


8This is also true of the LSDV model. 


*Dimitrios Asteriou and Stephen C. Hall, Applied Econometrics: A Modern Approach, Palgrave Macmillan, New York, 2007, 
p. 347. 


‘Notice that Eq. (16.5.3) has no intercept term (why?), but we can include it if there is a trend variable in the original 
model. 
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In passing, note that the original disturbance term is now replaced by the difference between the current 
and previous values of the disturbance term. If the original disturbance term is not autocorrelated, the trans- 
formed disturbance is, and therefore it poses the kinds of estimation problems that we discussed in Chapter 
11. However, if the explanatory variables are strictly exogenous, the first difference estimator is unbiased, 
given the values of the explanatory variables. Also note that the first-difference method has the same disad- 
vantages as the WG method in that the explanatory variables that remain fixed over time for an individual are 
wiped out in the first-difference transformation. 

It may be pointed out that the first difference and fixed effects estimators are the same when we have only 
two time periods, but if there are more than two periods, these estimators differ. The reasons for this are 
rather involved and the interested reader may consult the references.!! It is left as an exercise for the reader to 


apply the first difference method to our airlines example and compare the results with the other fixed effects 
estimators. 


16.6 The Random Effects Model (REM) 


Commenting on fixed effect, or LSDV, modeling, Kmenta writes: !2 


An obvious question in connection with the covariance [{i.e., LSDV] model is whether the inclusion of the dummy 
variables—and the consequent loss of the number of degrees of freedom—is really necessary. The reasoning 
underlying the covariance mode] is that in specifying the regression model we have failed to include relevant 
explanatory variables that do not change over time (and possibly others that do change over time but have the 
same value for all cross-sectional units), and that the inclusion of dummy variables is a coverup of our ignorance. 


If the dummy variables do in fact represent a lack of knowledge about the (true) model, why not express 
this ignorance through the disturbance term? This is precisely the approach suggested by the proponents of 
the so-called error components model (ECM) or random effects model (REM), which we will now illus- 
trate with our airline cost function. 

The basic idea is to start with Eq. (16.4.1): 


TC = Bu + PoQie + BaP Fin + Bal Pn + in (16.6.1) 


Instead of treating B,; as fixed, we assume that it is a random variable with a mean value of B, (no subscript 
i here). The intercept value for an individual company can be expressed as 


Bii = Bi + £i (16.6.2) 


where s; is a random error term with a mean value of zero and a variance of a 

What we are essentially saying is that the six firms included in our sample are a drawing from a much 
larger universe of such companies and that they have a common mean value for the intercept (= B,). The 
individual differences in the intercept values of each company are reflected in the error term €; 


Substituting Eq. (16.6.2) into Eq. (16.6.1), we obtain: 
TCie = Bi + BoQie + BaP Fit + Bah Fis + £i + Uir (16.6.3) 
= Bi + BoQir + B3 P Fit + Bal Fit + wit 
where 


Wit = Ej + Uit (16.6.4) 


"See in particular Jeffrey M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass., 
2002, pp. 279-283. 
12an Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, p. 633. 
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The composite error term w; consists of two components: £; which is the cross-section, or individual- 
specific, error component, and u; which is the combined time series and cross-section error component and 
is sometimes called the idiosyncratic term because it varies over cross-section (i.e., subject) as well as time. 
The error components model (ECM) is so named because the composite error term consists of two (or more) 
error components. 

The usual assumptions made by the ECM are that 


Cia N(0, oa) 
2 

uit ~ N(0, a,) 

E(eiuir) = 0; E(&;€;) =0 (Fj) 

E(ujuis) = E(ujjuij) = E(ujUjs) = 0 (Æj;t#s) 
that is, the individual error components are not correlated with each other and are not autocorrelated across 
both cross-section and time series units. It is also very important to note that w, is not correlated with any of 
the explanatory variables included in the model. Since £; is a component of w,» it is possible that the latter 
is correlated with the explantory variables. If that is indeed the case, the ECM will result in inconsistent 
estimation of the regression coefficients. Shortly, we will discuss the Hausman test, which will tell us in a 
given application if w,, is correlated with the explanatory variables, that is, whether ECM is the appropriate 
model. 

Notice carefully the difference between FEM and ECM. In FEM each cross-sectional unit has its own 
(fixed) intercept value, in all N such values for N cross-sectional units. In ECM, on the other hand, the 
(common) intercept represents the mean value of all the (cross-sectional) intercepts and the error component 
g; represents the (random) deviation of individual intercept from this mean value. Keep in mind, however, that 


g; is not directly observable; it is what is known as an unobservable, or latent, variable. 
As a result of the assumptions stated in Eq. (16.6.5), it follows that 


E(wir) = 0 (16.6.6) 


(16.6.5) 


var (wit) = 02 +07 (16.6.7) 

Now if of = 0, there is no difference between models (16.3.1) and (16.6.3) and we can simply pool all the 
(cross-sectional and time series) observations and run the pooled regression, as we did in Eq. (16.3.1). This 
. . . . . . . . È 
is true because in this situation there are either no subject-specific effects or they have all been accounted for 
in the explanatory variables. 

As Eq. (16.6.7) shows, the error term is homoscedastic. However, it can be shown that w; and w; (t # s) are 
correlated; that is, the error terms of a given cross-sectional unit at two different points in time are correlated. 
The correlation coefficient, corr(w;,, w;,), is as follows: 

a 
— . s a, erie S 
P = COW Wis) = Dp ager foes (16.6.8) 

Notice two special features of the preceding correlation coefficient. First, for any given cross-sectional 
unit, the value of the correlation between error terms at two different times remains the same no matter 
how far apart the two time periods are, as is clear from Eq. (16.6.8). This is in strong contrast to the first- 
order [AR(1)] scheme that we discussed in Chapter 12, where we found that the correlation between periods 
declines over time. Second, the correlation structure given in Eq. (16.6.8) remains the same for all cross- 
sectional units; that is, it is identical for all subjects. 
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If we do not take this correlation structure into account, and estimate Eq. (16.6.3) by OLS, the resulting 
estimators will be inefficient. The most appropriate method here is the method of generalized least squares 
(GLS). 

We will not discuss the mathematics of GLS in the present context because of its complexity.!* Since most 
modern statistical software packages now have routines to estimate ECM (as well as FEM), we will present 
the results for our illustrative example only. But before we do that, it may be noted that we can easily extend 


Eq. (16.4.2) to allow for a random error component to take into account variation over time (see Exercise 
16.6). 


The results of ECM estimation of the airline cost function are presented in Table 16.5. 


Table 16.5 


Dependent Variable: TC 
Method: Panel EGLS (Cross-section random effects) 


Sample: 1-15 

Periods included: 15 

Cross-sections included: 6 

Total panel (balanced) observations: 90 

Swamy and Arora estimator of component variances 


Coefficient Rech, icieCig Ee seadtelsmne Prob. 


E VOTA293 303966.2 3.534261 0.0007 
Q 2288588. 88172.77 25795572 0.0000 
PF ees Soa 0.083298 ALB) ASHE) 7 n 0.0000 
LF -~3084994. 58437 3...2 -5.279151 0.0000 
Effects Specification 
See Rho 
Cross-section random TOMI l2 0.2067 


Idiosyncratic random 210422 .8 CRESS 


Firm Effect 


ZOOOCOCGS -27061570 
1000000: 87 0iGiees 2 
-000000 -21338.40 
.000000 187142.9 
.000000 134488.9 
.000000 57383.09 


Am ® WD PB 
Num PWN FE 


Notice these features of the REM. The (average) intercept value is 107429.3. The (differential) intercept 
values of the six entities are given at the bottom of the regression results. Firm number 1, for example, has an 
intercept value which is 270615 units lower than the common intercept value of 107429.3; the actual value 
of the intercept for this airline is then -163185.7. On the other hand, the intercept value of firm number 6 is 
higher by 57383 units than the common intercept value; the actual intercept value for this airline is (107429.3 
+ 57383), or 164812.3. The intercept values for the other airlines can be derived similarly. However, note 
that if you add the (differential) intercept values of all the six airlines, the sum is 0, as it should be (why?). 


13See Kmenta, op. cit., pp. 625-630. 
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If you compare the results of the fixed-effect and random-effect regressions, you will see that there are 
substantial differences between the two. The important question now is: Which results are reliable? Or, to put 
it differently, which should be the choice between the two models? We can apply the Hausman test to shed 
light on this question. 

The null hypothesis underlying the Hausman test is that the FEM and ECM estimators do not differ 
substantially. The test statistic developed by Hausman has an asymptotic y~ ? distribution. If the null hypothesis 
is rejected, the conclusion is that the ECM is not appropriate because the random effects are probably corre- 
lated with one or more regressors. In this case, FEM is preferred to ECM. For our example, the results of the 
Hausman test are as shown in Table 16.6. 


Table 16.6 


Correlated Random Effects—Hausman Test 
Equation: Untitled 
Test cross-section random effects 


Chi-Sq. 
Test Summary Statistic Chi —Sq -ront Prob. 
Cross-section random 49.619687 3 0.0000 


Cross-section random effects test comparisons: 


Variable Fixed Random Van DEEE) Prob. 
Q 3319023 28 2288587.95 21587779733. 0.0000 
PF 0.773071 2359 0.002532 0.0000 


LF SES DS: 9084994 70 35225469544. ` 0.0001 


The Hausman test clearly rejects the null hypothesis, for the estimated y” value for 3 df is highly 
significant; if the null hypothesis were true, the probability of obtaining a chi-square value of as much as 
49.62 or greater would be practically zero. As a result, we can reject the ECM (REM) in favor of FEM. 
Incidentally, the last part of the preceding table compares the fixed-effect and random-effect coefficients of 
each variable and, as the last column shows, in the present example the differences are statistically significant. 


wv 


Breusch and Pagan Lagrange Multiplier Test!‘ 


Besides the Hausman wis we can also use the Breusch-Pagan (BP) test to test the hypothesis that there are 
no random effects, i.e., 0? in Eq. (16.6.7) is zero. This test is built into software packages such as STATA. 
Under the null hypothesis, BP follows a chi-square distribution with 1 df; there is only 1 df because we are 
testing the single hypothesis that 0,7 = 0. We will not present the formula underlying the test, for it is rather 
complicated. 

Turning to our airlines example, an application of the BP test produces a chi-square value of 0.61. With 
1 df, the p value of obtaining a chi-square value of 0.61 or greater is about 43 percent. Therefore, we do not 
reject the null hypothesis. In other words, the random effects model is not appropriate in the present example. 
The BP test thus reinforces the Hausman test, which also found that the random effects model is not appro- 
priate for our airlines example. 


14T, Breusch and A. R. Pagan, “The Lagrange Multiplier Test and Its Application to Model Specification in Econometrics,” 
Review of Economic Studies, vol. 47, 1980, pp. 239-253. 
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16.7 Properties of Various Estimators!’ 


We have discussed several methods of estimating (linear) panel regression models, namely, pooled estimators, 
fixed effects estimators that include least squares dummy variable (LSDV) estimators, fixed-effect within- 
group estimators, first-difference estimators, and random effects estimators. What are their statistical 
properties? Since panel data generally involve a large number of observations, we will concentrate on the 
consistency property of these estimators. 


Pooled Estimators 


Assuming the slope coefficients are constant across subjects, if the error term in Eq. (16.3.1) is uncorrelated 
with the regressors, pooled estimators are consistent. However, as noted earlier, the error terms are likely 
to be correlated over time for a given subject. Therefore, panel-corrected standard errors must be used 
for hypothesis testing. Make sure the statistical package you use has this facility, otherwise the computed 
standard errors may be underestimated. It should be noted that if the fixed effects model is appropriate but we 
use the pooled estimator, the estimated coefficients will be inconsistent. 


Fixed Effects Estimators 


Even if it is assumed that the underlying model is pooled or random, the fixed effects estimators are always 
consistent. 


Random Effects Estimators 


The random effects model is consistent even if the true model is the pooled estimator. However, if the true 
model is fixed effects, the random effects estimator is inconsistent. 

For proofs and further details about these properties, refer to the textbooks of Cameron and Trivedi, 
Greene, and Wooldridge cited in the footnotes. 


16.8 Fixed Effects versus Random Effects Model: Some Guidelines 


The challenge facing a researcher is: Which model is better, FEM or ECM? The answer to this question 
hinges around the assumption we make about the likely correlation between the individual, or cross-section 
specific, error component £; and the X regressors. 

If it is assumed that £; and the X’s are uncorrelated, ECM may be appropriate, whereas if £; and the X’s are 
correlated, FEM may be appropriate. 

The assumption underlying ECM is that the £; are random drawings from a much larger population, but 
sometimes this may not be so. For example, suppose we want to study the crime rate across the 25 states in 
India. Obviously, in this case, the assumption that the 25 states are a random sample is not tenable. 

Keeping this fundamental difference in the two approaches in mind, what more can we say about the 
choice between FEM and ECM? Here the observations made by Judge et al. may be helpful:!° 


1. If T (the number of time series data) is large and N (the number of cross-sectional units) is small, there 
is likely to be little difference in the values of the parameters estimated by FEM and ECM. Hence the 
choice here is based on computational convenience. On this score, FEM may be preferable. 


15The following discussion draws on A. Colin Cameron and Pravin K. Trivedi, Microeconometrics: Methods and Applications, 
Cambridge University Press, Cambridge, New York, 2005, Chapter 21. 
‘Judge et al., op. cit., pp. 489-491. 
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2. When N is large and T is small (i.e., a short panel), the estimates obtained by the two methods can 
differ significantly. Recall that in ECM 6,; = 8, + €; where e; is the cross-sectional random component, 
whereas in FEM we treat 6}; as fixed and not random. In the latter case, statistical inference is condi- 
tional on the observed cross-sectional units in the sample. This is appropriate if we strongly believe that 
the individual, or cross-sectional, units in our sample are not random drawings from a larger sample. 
In that case, FEM is appropriate. If the cross-sectional units in the sample are regarded as random 
drawings, however, then ECM is appropriate, for in that case statistical inference is unconditional. 

3. Ifthe individual error component s; and one or more regressors are correlated, then the ECM estimators 
are biased, whereas those obtained from FEM are unbiased. 

4. If N is large and T is small, and if the assumptions underlying ECM hold, ECM estimators are more 
efficient than FEM. 

5. Unlike FEM, ECM can estimate coefficients of time-invariant variables such as gender and ethnicity. 
The FEM does control for such time-invariant variables, but it cannot estimate them directly, as is 
clear from the LSDV or within-group estimator models. On the other hand, FEM controls for all 
time-invariant variables (why?), whereas ECM can estimate only such time-invariant variables as are 
explicitly introduced in the model. 


Despite the Hausman test, it is important to keep in mind the warning sounded by Johnston and DiNardo. 
In deciding between fixed effects or random effects models, they argue that, “ ... there is no simple rule to 
help the researcher navigate past the Scylla of fixed effects and the Charybdis of measurement error and 
dynamic selection. Although they are an improvement over cross-section data, panel data do not provide a 
cure-all for all of an econometrician’s problems.” 1 


16.9 Panel Data Regressions: Some Concluding Comments 


As noted at the outset, the topic of panel data modeling is vast and complex. We have barely scratched the 
surface. The following are among the many topics we have not discussed. 


. Hypothesis testing with panel data. 
Heteroscedasticity and autocorrelation in ECM. 
Unbalanced panel data. 

Dynamic panel data models in which the lagged value(s) of the regressand appears as an explanatory 
variable. 

5. Simultaneous equations involving panel data. 

6. Qualitative dependent variables and panel data. 

7. Unit roots in panel data (on unit roots, see Chapter 21). 


wv 


ae oS 


One or more of these topics can be found in the references cited in this chapter, and the reader is urged to 
consult them to learn more about this topic. These references also cite several empirical studies in various 
areas of business and economics that have used panel data regression models. The beginner is well-advised 
to read some of these applications to get a feel for how researchers have actually implemented such models. !® 


‘Jack Johnston and John DiNardo, Econometric Methods, 4th ed., McGraw-Hill, 1997, p. 403. 


18For further details and concrete applications, see Paul D. Allison, Fixed Effects Regression Methods for Longitudinal Data, 
Using SAS, SAS Institute, Cary, North Carolina, 2005. 
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16.10 Some Illustrative Examples 


Example 16.1 Productivity and Public Investment 


To find out why productivity has declined and what the role of public investment is, Alicia Munnell studied 
productivity data in 48 continental United States for 17 years from 1970 to 1986, for a total of 816 observa- 
tions.'? Using these data, we estimated the pooled regression in Table 16.7. Note that this regression does not 
take into account the panel nature of the data. 


Table 16.7 


Dependent Variable: LGSP 
Method: Panel Least Squares 


Sample: 1970-1986 

Periods included: 17 

Cross-sections included: 48 

Total panel (balanced) observations: 816 


Coefficient Std. Error EVStakisitic Prob. 


G 0.907604 0.091328 9° 937854 0.0000 
LPRIVCAP 0.376001 OF02Z7 7.53 es Seay 0.0000 
LPUBCAP 0.351478 0.016162 Pl 5 Hy Se: 0.0000 
LWATER 0.312959 0.018739 16.70062 0.0000 
LUNEMP -0.069886 (Ojep( Ola SOE -4.630528 0.0000 
R-squared 0.981624 Mean dependent var. 1050885 
Adjusted R-squared 071981593 S.D. dependent var. AL OA Us 
S.E. of regression ORs Brib5 F-statistic. 10 SSiOm sal 
Sum squared resid. SEMENG Prob. (F-statistic) 0.000000 
Log likelihood 456.2346 Durbin-Watson stat. 0.063016 


The dependent variable in this model is GSP (gross state product), and the explanatory variables are: 
PRIVCAP (private capital), PUBCAP (public capital), WATER (water utility capital), and UNEMP (unemployment 
rate). Note: L stands for natural log. 

All the variables have the expected signs and all are individually, as well as collectively, statistically signif- 
icant, assuming all the assumptions of the classical linear regression model hold true. 

To take into account the panel dimension of the data, in Table 16.8 we estimated a fixed effects model 
using 47 dummies for the 48 states to avoid falling into the dummy-variable trap. To save space, we only 
present the estimated regression coefficients and not the individual dummy coefficients. But it should be 
added that all of the 47 state dummies were individually highly statistically significant. 

You can see that there are substantial differences between the pooled regression and the fixed-effects 
regression, casting doubt on the results of the pooled regression. 

To see if the random effects model is more appropriate in this case, we present the results of the random 
effects regression model in Table 16.9. 

To choose between the two models, we use the Hausman test, which gives the results shown in 
Table 16.10. 


19The Munnell data can be found at www.aw-bc.com/murray. 
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Table 16.8 


Dependent Variable: LGSP 
Method: Panel Least Squares 


Sample: 1970-1986 

Periods included: 17 

Cross-sections included: 48 

Total panel (balanced) observations: 816 


Coefficient Std- Isisizene t Statistic Prob. 

Gq -0.033235 0.208648 -0.159286 0.8735 
LPRIVCAP 0.267096 0-037015 7.215864 0.0000 
LPUBCAP 0.714094 0.026520 26.92636 0.0000 
LWATER 0.088272 OMOP MEd 4.090291 0.0000 
LUNEMP -0.138854 0.007851 -17.68611 0.0000 


Effects Specification 


Cross-section fixed (dummy variables) 


R-squared 0.997634 Mean dependent var. 10.50885 


Adjusted R-squared 0.997476 S.D. dependent var. RO 2AM)? 

Saf. Of SeEgression 01051503 F-statistic 6315.897 

Sum squared resid. 2.010854 Prob. (F-statistic) 0.000000 

Log likelihood 1292.535 Durbin-Watson stat. 0.520682 
Table 16.9 


Dependent Variable: LGSP 
Method: Panel EGLS (Cross-section random effects) 


Sample: 1970-1986 

Periods included: 17 

Cross-sections included: 48 

Total panel (balanced) observations: 816 

Swamy and Arora estimator of component variances 


Coefficient Std. ECCO t Statistic Prob. 

€ -0.046176 Omveties 7, -0.285680 Ome S2 
LPRIVCAP 0.313980 0.029740 < 10255760 0.0000 
LPUBCAP 0.641926 070288380 27554 0.0000 
LWATER 0.130768 0.020281 6.447875 .0.0000 


LUNEMP 0) 5 LSA 0.007442 S18, BOSS, 0.0000 
eer E ee 


Effects Specification 


Sols Rho 
ee ee a 
Cross-section random OresiOMl2is 0.8655 


Idiosyncratic random 0.051303 0.1345 
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Table 16.10 
Chi-Sq. 
Test Summary Statistic chissa. dif: Prob. 
Cross-section random 


42 .458353 4 0.0000 


Cross-section random effects test comparisons: 


_ Variable Fixed i Random Var (Diff.) Prob. 
LPRIVCAP 0.267096 0.313980 0.000486 0.0334 
LPUBCAP 0.714094 0.641926 0.000159 0.0000 

LWATER 0.088272 0.130768 0.000054 0.0000 


LUNEMP -0.138854 -0.139820 0.000006 0693 


Since the estimated chi-square value is highly statistically significant, we reject the hypothesis that there is 
no significant difference in the estimated coefficients of the two models. It seems there is correlation between 
the error term and one or more regressors. Hence, we can reject the random effects model in favor of the 
fixed effects model. Note, however, as the last part of Table 16.10 shows, not all coefficients differ in the two 
models. For example, there is not a statistically significant difference in the values of the LUNEMP coefficient 
in the two models. 


Example 16.2 Demand for Electricity in the USA 


In their article, Maddala et al. considered the demand for residential electricity and natural gas in 49 states in 
the USA for the period 1970-1990; Hawaii was not included in the analysis.” They collected data on several 
variables; these data can be found on the book’s website. In this example, we will only consider the demand 
for residential electricity. We first present the results based on the fixed effects estimation (Table 16.11) and 
then the random effects estimation (Table 16.12), followed by a comparison of the two models. 


Table 16.11 


Dependent Variable: Log(ESRCBPC) 
Method: Panel Least Squares 


Sample: 1971-1990 

Periods included: 20 

Cross-sections included: 49 

Total panel (balanced) observations: 980 


Coefficient Sis. emissos t Stat istic E Prob. 

E -12.55760 0.363436 -34.55249 0.0000 

Log (RESRCD) -0.628967 0.029089 -21.62236 0.0000 
Log (YDPC) 1.062439 0.040280 26.37663 0.0000 
(Contd.) 


20G. S. Maddala, Robert P. Trost, Hongyi Li, and Frederick Joutz, “Estimation of Short-run and Long- run Elasticities of 
Demand from Panel Data Using Shrikdage Estimators,” Journal of Business and Economic Statistics, vol. 15, no. 1, January 


1997, pp. 90-100. 
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(Contd.) 
Effects Specification 


i 


Cross-section fixed (dummy variables) 


R-squared 0.757600 Mean dependent var. -4.536187 
Adjusted R-squared 0.744553 S.D. dependent var. 0.316205 
S.E. of regression 0.159816 Akaike info criterion -0.778954 
Sum squared resid. 23 TPS SEA Schwarz criterion -0.524602 
Log likelihood 432.6876 Hannan-Quinn criter. -0.682188 
F-statistic 58.07007 Durbin-Watson stat. 0.404314 


Prob. (F-statistic) 0.900000 


where Log (ESRCBPC) = natural log of residential electricity consumption per capita (in billion btu), 
Log(RESRCD) = natural log of real 1987 electricity price, and Log(YDPC) = natural log of real 1987 disposable 
income per capita. 

Since this is a double-log model, the estimated slope coefficients represent elasticities. Thus, holding other 
things the same, if real per capita income goes up by 1 percent, the mean consumption of electricity goes 
up by about 1 percent. Likewise, holding other things constant, if the real price of electricity goes up by 1 
percent, the average consumption of electricity goes down by about 0.6 percent. All the estimated elasticities 
are Statistically significant. 

The results of the random error model are as shown in Table 16.12. 


Table 16.12 


Dependent Variable: Log(ESRCBPC) 
Method: Panel EGLS (Cross-section random effects). 


Sample: 1971-1990 

Periods included: 20 

Cross-sections included: 49 

Total panel (balanced) observations: 980 

Swamy and Arora estimator of component variances 


Coefficient Slevel,, lnieitene t Statistic Prob. 
GC -11.68536 03353285 -33.07631 0.0000 ~ 

Log (RESRCD) -0.665570 0.028088 -23.69612 0.0000 

Log (YDPC) 0.980877 OOS 9257 24.98617 0.0000 
Effects Specification 

S- DE Rho 
Cross-section random 0.123560 0.3041 
Idiosyncratic random 0.159816 .0.6259 
Weighted Statistics 

R-squared 0 4E2591 Mean dependent var. ` -1.260296 

Adjusted R-squared 0.461491 S.D. dependent var. 0.229066 

S.E, of regression 0.168096 Sum squared resid. 27.60641 

F-statistic . 420.4906 Durbin-Watson stat. 0.345453 


Prob. (F-statistic), 0.000000 
(Contd.) 
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(Contd.) 
Unweighted Statistics 
R-squared 0.267681 ~ Mean dependent var. -4.536187 


Sum squared resid. 71.68384 Durbin-Watson stat. 0133089 


It seems that there is not much difference in the two models. But we can use the Hausman test to find out 
if this is so. The results of this test are as shown in Table 16.13. 


Table 16.13 


Correlated Random Effects—Hausman Test 
Equation: Untitled 
Test cross-section random effects 


Chi-Sq. 
Test Summary Statistic Chi-Sq. dif. Prob. 
Cross-section random 105.865216 2 0.0000 


Cross-section random effects test comparisons: 


Variable Fixed Random vVar (DIE. ) Prob. 
Log (RESRCD) -0.628967 -0.665570 0.000057 0.0000 


Log (YDPC) 1.062439 0.980877 0.000081 0.0000 


Although the coefficients of the two models in Tables 16.11 and 16.12 look quite similar, the Hausman test 
shows that this is not the case. The chi-square value is highly statistically significant. Therefore, we can choose 
the fixed effects model over the random effects model. This example brings out the important point that when 
the sample size is large, in our case 980 observations, even small differences in the estimated coefficients of the two 
models can be statistically significant. Thus, the coefficients of the Log (RESRCD) variable in the two models look 
reasonably close, but statistically they are not. 


Example 16.3 Beer Consumption, Income and Beer Tax 


To assess the impact of beer tax on beer consumption, Philip Cook investigated the relationship between 
the two, after allowing for the effect of income.?' His data pertain to 50 states and Washington, D.C, 
for the period 1975-2000. In this example we study the relationship of per capita beer sales to tax rate 
and income, all at the state level. We present the results of pooled OLS, fixed effects, and random effects 
models in tabular form in Table 16.14. The dependent variable is per capita beer sales. 

These results are interesting. As per economic theory, we would expect a negative relationship 
between beer consumption and beer taxes, which is the case for the three models. The negative income 
effect on beer consumption would suggest that beer is an inferior good. An inferior good is one 
whose demand decreases as consumers’ income rises. Maybe when their income rises, consumers prefer 
champagne! 


21The data used here are obtained from the website of Michael P. Murphy, Econometrics: A Modern Introduction, Pearson/ 
Addison Wesley, Boston, 2006, but the original data were collected by Philip Cook for his book, Paying the Tab: The Costs 
and Benefits of Alcohol Control, Princeton University Press, Princeton, New Jersey, 2007. 
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Table 6.14 

Variable OLS FEM REM 

Constant 1.4192 1.7617 1.7542 
(24.37) (52.23) (39.22) 

Beer tax —0.0067 —0.0183 —0.0181 
(—2.13) `  (-9.67) (—9.69) 

Income —3.54(e-°) —0.000020 —0.000019 
(—1.12) (—9.17) (—9.10)° 


R? 0.0062 0.0052 0.0052 


Notes: Figures in parentheses are the estimated ratios. —3.54(e~°) = —0.00000354. 


For our purpose, what is interesting is the difference in the estimated coefficients. Apparently there 
is not much difference in estimated coefficients between FEM and ECM. As a matter of fact, the 
Hausman test produces a chi-square value of 3.4, which is not significant for 2 df at the 5 percent level; 
the p value is 0.1783. 

The results based on OLS, however, are vastly different. The coefficient of the beer tax variable, in 
absolute value, is much smaller than that obtained from FEM or ECM. The income variable, although 
it has the negative sign, is not statistically significant, whereas the other two models show that it is 
highly significant. 

This example shows very vividly what could happen if we neglect the panel structure of the data 
and estimate a pooled regression. 


Summary and Conclusions 


1. Panel regression models are based on panel data. Panel data consist of observations on the same cross- 
sectional, or individual, units over several time periods. 

2. There are several advantages to using panel data. First, they increase the sample size considerably. Second, 
by studying repeated cross-section observations, panel data are better suited to study the dynamics of 
change. Third, panel data enable us to study more complicated behavioral models. 

3. Despite their substantial advantages, panel data pose several estimation and inferemee problems. Since 
such data involve both cross-section and time dimensions, problems that plague cross-sectional data 
(e.g., heteroscedasticity) and time series data (e.g., autocorrelation) need to be addressed. There are 
some additional problems as well, such as cross-correlation in individual units at the same point in 
time. 

4. There are several estimation techniques to address one or more of these problems. The two most 
prominent are (1) the fixed effects model (FEM) and (2) the random effects model (REM), or error 
components model (ECM). 

5. In FEM, the intercept in the regression model is allowed to differ among individuals in recognition of 
the fact that each individual, or cross-sectional, unit may have some special characteristics of its own. 
To take into account the differing intercepts, one can use dummy variables. The FEM using dummy 
variables is known as the least-squares dummy variable (LSDV) model. FEM is appropriate in situations 
where the individual-specific intercept may be correlated with one or more regressors. A disadvantage 
of LSDV is that it consumes a lot of degrees of freedom when the number of cross-sectional units, N, is 
very large, in which case we have to introduce N dummies (but suppress the common intercept term). 
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6. An alternative to FEM is ECM. In ECM it is assumed that the intercept of an individual unit is a random 
drawing from a much larger population with a constant mean value. The individual intercept is then 
expressed as a deviation from this constant mean value. One advantage of ECM over FEM is that it 
is economical in degrees of freedom, as we do not have to estimate N cross-sectional intercepts. We 
need only to estimate the mean value of the intercept and its variance. ECM is appropriate in situations 
where the (random) intercept of each cross-sectional unit is uncorrelated with the regressors. Another 
advantage of ECM is that we can introduce variables such as gender, religion, and ethnicity, which 
remain constant for a given subject. In FEM we cannot do that because all such variables are colinear 
with the subject-specific intercept. Moreover, if we use the within-group estimator or first-difference 
estimator, all such time-invariance will be swept out. 

7. The Hausman test can be used to decide between FEM and ECM. We can also use the Breusch-Pagan 
test to see if ECM is appropriate. 

8. Despite its increasing popularity in applied research, and despite the increasing availability of such 
data, panel data regressions may not be appropriate in every situation. One has to use some practical 
judgment in each case. 

9. There are some specific problems with panel data that need to be borne in mind. The most serious is the 
problem of attrition, whereby, for one reason or another, subjects of the panel drop out over time so that 
over subsequent surveys (or cross-sections) fewer original subjects remain in the panel. Even if there is 
no attrition, over time subjects may refuse or be unwilling to answer some questions. 


Multiple Choice Questions 


1. In panel data, f 
a. The same cross- sectional units are surveyed over time 
b. Different cross-sectional units are surveyed over time 
c. Different cross-sectional units are surveyed at a point in time 
d. Cross-sectional units are surveyed in detail 
2. Data set where each subject has the same number of observations is called 
a. Short panel 
b. Long panel 
c. Balanced panel 
d. Unbalanced panel 
3. In a dataset, if the number of cross-sectional subjects N is greater than the number of time periods T, 
then the data is of the type 
a. Short panel 
b. Long panel 
c. Balanced panel 
d. Unbalanced panel 
4. A variable that does not depend on the past, current and future values of the error term is called 
Strictly endogenous variable 
Strictly exogenous variable 
Partially endogenous variable 
Partially exogenous variable 


A Nn D eR 
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10. 


11. 


12. 
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. Pooled OLS regression model is also known as 


a. Constant coefficient model 
b. Constant variance model 
c. Constant correlation model 
d. Constant variable model 


. In which of the following models, both intercept and slope coefficient is fixed across individual subjects 


and over time? 
a. Pooled OLS model 
b. Fixed effect least squares dummy variable model 
c. Fixed effect within-group model 
d. Random effect model 
In which of the following models, the intercept varies acro: s subjects but remains time-invariant? 
a. Pooled OLS model 
b. Fixed effect least squares dummy variable model 
c. Fixed effect within-group model 
d. Random effect model 
In which of the following models, the intercept varies across subjects and over time? 
a. Pooled OLS model 
b. Fixed effect least squares dummy variable model 
c. Fixed effect within-group model 
d. Random effect model 
In which of the following models, both the dependent and explanatory variables are expressed in terms 
of deviations from their respective mean values? 
a. Pooled OLS model ae 
b. Fixed effect least squares dummy variable model 
c. Fixed effect within-group model 
d. Random effect model 
In which of the following models, the composite error term has two error components; first one repre- 
sents the cross-section specific error component and the second gives the combined time-series and 
cross-section error component? 
a. Pooled OLS model 
b. Fixed effect least squares dummy variable model 
c. Fixed effect within-group model 
d. Random effect model 
One-way fixed effects model is where 
a. The intercept is fixed over subject but not over time 
b. The intercept is fixed over time but not over subjects 
c. The intercept is fixed over both time and subjects 
d. The intercept is variant over both time and subject 
Two-way fixed effects model is where 
a. The intercept is fixed over subject but not over time 
b. The intercept is fixed over time but not over subjects 
c. The intercept is fixed over both time and subjects 
d. The intercept is variant over both time and subject 


Hk 


14. 


15. 


16. 


17. 


18. 


19. 


20. 
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One of the disadvantage of fixed-effect within-group estimation method over ordinary pooled regression 
model is that the slope coefficients are 
a. Efficient but inconsistent 
b. Consistent but inefficient 
c. Inconsistent and inefficient 
d. Consistent and efficient 
In general, when we difference a variable, we remove the long-run component from that variable. What 
is left is the short-run value of that variable. This statement 
a. is true 
b. is false 
c. Depends on the time series studied 
d. Depends on the cross-section studied 
While dealing with random effects model, the most appropriate procedure to be adopted is 
a. OLS procedure 
b. GLS procedure 
c. Seemingly unrelated procedure 
d. Two-stage least square procedure 
The Hp, we test using Hausman statistics is that 
a. FEM and REM estimators differ substantially 
b. FEM and REM estimators do not differ substantially 
c. FEM and REM estimators are equal to zero 
d. FEM and REM estimators are not equal to zero 
Hausman test statistics follows 
a. Normal distribution 
b. t-distribution 
c. x’ distribution 
d. F-distribution 
In Hausman test, rejecting the Hj) means 
a. Fixed effects model is preferred to Error Component Model 
b. ECM is preferred to FEM 
c. Both ECM and FEM are equally preferred 
d. Neither ECM nor FEM are preferred 
If the individual error component is found to be correlated with one or more X variables, the estimates 
obtained will be unbiased if 
a. Random effects model is estimated 
b. Fixed effects model is estimated 
c. Pooled regression 
d. All of these are same 
Under which of the following conditions, we find REM estimators to be more efficient than FEM 
estimators, assuming that all underlying assumptions are satisfied? 
a. When N is large and T is small 
b. When N is small and T is large 
c. When N is equal to T 
d. Under all the above conditions 
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Exercises 


Questions 


16.1. 
16.2. 


16.3. 
16.4. 
16.5. 
16.6. 
16.7. 


16.8. 
16.9. 


What are the special features of (a) cross-section data, (b) time series data, and (c) panel data? 

What is meant by a fixed effects model (FEM)? Since panel data have both time and space dimen- 
sions, how does FEM allow for both dimensions? 

What is meant by an error components model (ECM)? How does it differ from FEM? When is ECM 
appropriate? And when is FEM appropriate? 

Is there a difference between LSDV, within-estimator, and first-difference models? 

When are panel data regression models inappropriate? Give examples. 

How would you extend model (16.4.2) to allow for a time error component? Write down the model 
explicitly. 

Refer to the data on eggs produced and their prices given in Table 1.1. Which model may be appro- 
priate here, FEM or ECM? Why? 

For the investment data given in Table 1.2, which model would you choose—FEM or REM? Why? 
Based on the Michigan Income Dynamics Study, Hausman attempted to estimate a wage, or earnings, 
model using a sample of 629 high school graduates, who were followed for a period of six years, thus 
giving in all 3,774 observations. The dependent variable in this study was logarithm of wage, and 
the explanatory variables were: age (divided into several age groups); unemployment in the previous 
year; poor health in the previous year; self-employment; region of residence (for graduate from the 
South, South = 1 and O otherwise) and area of residence (for a graduate from rural area, Rural = 1 
and 0 otherwise). Hausman used both FEM and ECM. The results are given in Table 16.15 (standard 
errors in parentheses). 


Table 16.15 Wage Equations (Dependent Variable: Log Wage) 


Variable Fixed Effects Random Effects 
1. Age 1 (20-35) 0.0557 (0.0042) 0.0393 (0.0033) 
2. Age 2 (35-45) 0.0351 (0.0051) 0.0092 (0.0036) 
3. Age 3 (45-55) 0.0209 (0.0055) —0.0007 (0.0042) 
4. Age 4 (55-65) 0.0209 (0.0078) —0.0097 (0.0060) 
5. Age 5 (65- ) —0.0171 (0.0155) —0.0423 0.0121) 
6. Unemployed previous year —0.0042 (0.0153) —0.0277 (0.0151) 
7. Poor health previous year —0.0204 (0.0221) —0.0250 (0.0215) 
8. Self-employment —0.2190 (0.0297) —0.2670 (0.0263) 
9. South —0.1569 (0.0656) —0.0324 (0.0333) 
10. Rural —0.0101 (0.0317) —0.1215 (0.0237) 
11. Constant —— ` 0.8499 (0.0433) 
SE 0.0567 0.0694 
Degrees of freedom 3,135 3,763 


Source: Reproduced from Cheng Hsiao, Analysis of Panel Data, Cambridge University Press, 1986, p. 42. Original source: J. A. Hausman, 

“Specification Tests in Econometrics,” Econometrica, vol. 46, 1978, pp. 1251-1271. 

a. Do the results make economic sense? 

b. Is there a vast difference in the results produced by the two models? If so, what might account for 
these differences? 

c. On the basis of the data given in the table, which model, if any, would you choose? 
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Empirical Exercises 


16.10. 


16.11. 


16.12: 


16:13. 


16.14. 


Refer to the airline example discussed in the text. Instead of the linear model given in Eq. (16.4.2), 

estimate a log-linear regression model and compare your results with those given in Table 16.2. 

Refer to the data in Table 1.1. 

a. Let Y = labour productivity (in Rs lakh per worker) and X = wages (in Rs lakh). Estimate the model 
for the years 2007—08 and 2008-09 separately. 

b. Pool the observations for the two years and estimate the pooled regression. What assumptions are 
you making in pooling the data? 

c. Use the fixed effects model, distinguishing the two years, and present the regression results. 

d. Can you use the fixed effects model, distinguishing the 27 states? Why or why not? 

e. Would it make sense to distinguish both the state effect and the year effect? If so, how many 
dummy variables would you have to introduce? 

J. Would the error components model be appropriate to model the labour productivity? Why or why 
not? See if you can estimate such a model using, say, EViews. 

Continue with Exercise 16.11. Before deciding to run the pooled regression, you want to find out 

whether the data are “poolable.” For this purpose you decide to use the Chow test discussed in Chapter 

8. Show the necessary calculations involved and determine if the pooled regression makes any sense. 

Use the investment data given in Table 1.2. 

a. Estimate the Grunfeld investment function for each company individually. 

b. Now pool the data for all the companies and estimate the Grunfeld investment function by OLS. 

c. Use LSDV to estimate the investment function and compare your results with the pooled regression 
estimated in (b). 

d. How would you decide between the pooled regression and the LSDV regression? Show the 
necessary calculations. 

Table 16.16 gives data on the hourly compensation rate in manufacturing in U.S. dollars, Y (%), and 

the civilian unemployment rate, X (index, 1992 = 100), for Canada, the United Kingdom, and the 

United States for the period 1980-2006. Consider the model: 


Vir = By + BoXit + uit (1) 


Table 16.16 Unemployment Rate and Hourly Compensation in Manufacturing, in the United States, Canada, and the 
United Kingdom, 1980—2006. 


Year 


1980 
1981 
1982 
1983 
1984 
1985 
1986 
1987 
1988 


COMP_U.S. UN_U.S. COMP_CAN UN _CAN COMP_U.K. UN_U.K. 


55:9 7.1 49.0 7.3 47.1 6.9 
61.6 7.6 53.8 75 47.5 Si 
67.2 97 60.1 10.7 45.1 10.8 
69.3 9.6 64.3 11.6 41.9 11.5 
71.6 75 65.0 10.9 39.8 11.8 
153 72 65.0 10.2 42.3 11.4 
78.8 7.0 64.9 93 52.0 11.4 
81.3 6.2 69.6 8.4 64.5 10.5 
84.1 5.5 78.5 7.4 74.8 8.6 


o a nn no 


(Contd.) 
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(Contd.) 
1989 


1990 
1991 
1992 
1993 


1994 
1995 
1996 
1997 
1998 
1999 
2000 
2001 
2002 
2003 
2004 
2005 
2006 


86.6 53 85.5 TA 35 7.3 
5:6 
90:5 92.4 Z 89.6 z 
95.6 6.8 100.7 9.8 9919 8.9 
100.0 J 100.0 10.6 100.0 10.0 
102.0 6.9 94.8 10.8 88.8 10.4 
6.1 
105.3 927 9.6 92.8 87 
107.3 5.6 9919 8.6 973 8.7 
109.3 SA 95.9 8.8 96.0 8.1 
112.2 4.9 96.7 8.4 104.1 7.0 
118.7 4.5 94.9 VA 113.8 6.3 
123.4 4.2 96.8 7.0 117.5 6.0 
134.7 4.0 100.0 6.1 114.8 5.5 
137.8 4.7 98.9 6:5 114.7 Sal 
147.8 5.8 101.0 7.0 126.8 5.2 
158.2 6.0 116.7 6.9 145.2 5.0 
161.5 Ce) JIZZ 6.4 171.4 4.8 
168.3 | 141.8 6.0 177.4 4.8 


172.4 4.6 1555 5.5 192.3 Dio 


Notes: UN = Unemployment rate %. 
COMP = Index of hourly compensation in U. S. dollars, 1992-100. 
CAN = Canada. 
Source: Economic Report of the President, January 2008, Table B-109. 


16.15. 


a. A priori, what is the expected relationship between Y and X? Why? 
b. Estimate the model given in Eq. (1) for each country. . 
c. Estimate the model, pooling all of the 81 observations. 
d. Estimate the fixed effects model. 

e. Estimate the error components model. 

f. Which is a better model, FEM or ECM? Justify your answer (Hint: Apply the Hausman Test). 
Baltagi and Griffin considered the following gasoline demand function:* 


In Y; = By + Bo In Xj; + B3 ln Xit + Ba ln Xgy + Uir 


Where Y = gasoline consumption per car, X, = real income per capita, X, = real gasoline price, X, = 

number of cars per capita, i = country code, in all 18 OECD countries, and ¢ = time (annual observa- 

tions from 1960-1978). Note: Values in table are logged already. 

a. Estimate the above demand function pooling the data for all 18 of the countries (a total of 342 
observations). 

b. Estimate a fixed effects model using the same data. 

c. Estimate a random components model using the same data. 

d. From your analysis, which model best describes the gasoline demand in the 18 OECD countries? 
Justify your answer. 


"B. H. Baltagi and J. M. Griffin, “Gasoline Demand in the OECD: An Application of Pooling and Testing Procedures,” Europe- 
an Economic Review, vol. 22, 1983, pp. 117-137. The data for 18 OECD countries for the years 1960-1978 can be obtained 
from: http://www.wiley.com/legacy/wileychi/baltagi/ supp/Gasoline.dat, or from the textbook website, Table 16.17. 
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16.16. The article by Subhayu Bandyopadhyay and Howard J. Wall, “The Determinants of Aid in the 


16.17. 


Post-Cold War Era,” Review, Federal Reserve Bank of St. Louis, November/December 2007, vol. 89, 
number 6, pp. 533-547, uses panel data to estimate the responsiveness of aid to recipient countries’ 
economic and physical needs, civil/political rights, and government effectiveness. The data are for 
135 countries for three years. The article and data can be found at: http:// research.stlouisfed.org/ 
publications/review/past/2007 in the November/December, Vol. 89, No. 10 section. The data can also 
be found on the textbook website in Table 16.18. Estimate the authors’ model (given on page 534 of 
their article) using a random effects estimator. Compare your results with those of the pooled and 
fixed effects estimators given by the authors in Table 2 of their article. Which model is appropriate 
here, fixed effects or random effects? Why? 

Refer to the airlines example discussed in the text. For each airline, estimate a time series logarithmic 
cost function. How do these regressions compare with the fixed effects and random effects models 
discussed in the chapter? Would you also estimate 15 cross-section logarithmic cost functions? Why 
or why not? 


Key to Multiple Choice Questions 


1. (a) 2. (c) 3. (a) 4. (b) 5. (a) 6. (a) I (b) 8. (b) oE) 
10. (d) 11. (a) 12. (d) 13. (b) 14. (a) 15. (b) 16. (b) 17. (c) 18. (a) 
19. (b) 20. (a) 


CHAPTER 


Dynamic Econometric 
Models: Autoregressive and 


Distributed-Lag Models 


In regression analysis involving time series data, if the regression model includes not only the current but 
also the lagged (past) values of the explanatory variables (the X’s), it is called a distributed-lag model. If 
the model includes one or more lagged values of the dependent variable among its explanatory variables, it is 
called an autoregressive model. Thus, 


Y, =a + oX: + BiX1-1 + BoXr-2 + uy 
represents a distributed-lag model, whereas 
Y,=a+BX;+yY,1+u; 


is an example of an autoregressive model. The latter are also known as dynamic models since they portray 
the time path of the dependent variable in relation to its past value(s). 

Autoregressive and distributed-lag models are used extensively in econometric analysis, and in this chapter 
we take a close look at such models with a view to finding out the following: 


1. What is the role of lags in economics? 

2. What are the reasons for the lags? 

3. Is there any theoretical justification for the commonly used lagged models in empirical econometrics? 
4. What is the relationship, if any, between autoregressive and distributed-lag models? Can one be derived 
from the other? 

What are some of the statistical problems involved in estimating such models? 

6. Does a lead-lag relationship between variables imply causality? If so, how does one measure it? 


Nn 
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17.1 The Role of “Time,” or “Lag,” in Economics 


In economics the dependence of a variable Y (the dependent variable) on another variable(s) X (the explan- 
atory variable) is rarely instantaneous. Very often, Y responds to X with a lapse of time. Such a lapse of time 
is called a lag. To illustrate the nature of the lag, we consider several examples. 


Example 17.1 The Consumption Function 


Suppose a person receives a salary increase of Rs. 2,000 in annual pay, and suppose that this is a “permanent” 
increase in the sense that the increase in salary is maintained. What will be the effect of this increase in income 
on the person’s annual consumption expenditure? 

Following such a gain in income, people usually do not rush to spend all the increase immediately. Thus, 
our recipient may decide to increase consumption expenditure by Rs. 800 in the first year following the salary 
increase in income, by another Rs. 600 in the next year, and by another Rs. 400 in the following year, saving 
the remainder. By the end of the third year, the person’s annual consumption expenditure will be increased 
by Rs. 1,800. We can thus write the consumption function as 


Y; = constant + 0.4X;+0.3X+7 + 0.2Xt-2 + Ut (17.1.1) 


where Y is consumption expenditure and X is income. 

Equation (17.1.1) shows that the effect of an increase in income of Rs. 2,000 is spread, or distributed, over 
a period of 3 years. Models such as Eq. (17.1.1) are therefore called distributed-lag models because the 
effect of a given cause (income) is spread over a number of time periods. Geometrically, the distributed-lag 
model (17.1.1) is shown in Figure 17.1, or alternatively, in Figure 17.2. 


Consumption expenditure, Rs. 


Time 
0 ty t2 t3 


Figure 17.1 Example of distributed lags. 
More generally we may write 


Y, = æ + BoX; + BiXi-1 + p2Xi-2 + +++ + BkXi-k + te (17.1.2) 


which is a distributed-lag model with a finite lag of k time periods. The coefficient Bo is known as the short- 
run, or impact, multiplier because it gives the change in the mean value of Y following a unit change in X in 
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Effect on ¥Y BoX; BX; BX; B3X; pany oe 
A Nw 


Time 


fe : t+1 t+2 t+3 t+4 
Figure 17.2 The effect of a unit change in X at time fon Y at time ¢and subsequent time petiods. 


the same time period.’ If the change in X is maintained at the same level thereafter, then (By + B,) gives the 
change in (the mean value of) Y in the next period, (By + 8, + B2) in the following period, and so on. These 
partial sums are called interim, or intermediate, multipliers. Finally, after k periods we obtain 


k 
Yh = Bo + Bi + fo +-- +P = 8 (17.1.3) 
r=) 


which is known as the long-run, or total, distributed-lag multiplier, provided the sum £ exists (to be 
discussed elsewhere). 

If we define 

AL Pa 17.1.4 
2 ees 
we obtain “standardized” f;. Partial sums of the standardized £; then give the proportion of the long-run, or 
total, impact felt by a certain time period. 

Returning to the consumption regression (17.1.1), we see that the short-run multiplier, which is nothing 
but the short-run marginal propensity to consume (MPC), is 0.4, whereas the long-run multiplier, which is 
the long-run marginal propensity to consume, is 0.4 + 0.3 + 0.2 = 0.9. That is, following a Re 1 increase in 
income, the consumer will increase his or her level of consumption by about 40 paisa in the year of increase, 
by another 30 paisa in the next year, and by yet another 20 paisa in the following year. The long-run impact of 
an increase of Re 1 in income is thus 90 paisa. If we divide each B; by 0.9, we obtain, respectively, 0.44, 0.33, 
and 0.23, which indicate that 44 percent of the total impact of a unit change in X on Y is felt immediately, 77 
percent after one year, and 100 percent by the end of the second year. 


Example 17.2 Creation of Bank Money (Demand Deposits) 


Suppose the Reserve Bank of India pours Rs. 1,00,000 of new money into the banking system by buying 
government securities. What will be the total amount of bank money, or demand deposits, that will be 
generated ultimately? 

Following the fractional reserve system, if we assume that the law requires banks to keep a 20 percent 
reserve backing for the deposits they create, then by the well-known multiplier process the total amount of 
demand deposits that will be generated will be equal to Rs. [1,00,000 [1/(1—-0.8)] = Rs. 5,00,000. Of course, 
Rs. 5,00,000 in demand deposits will not be created overnight. The process takes time, which can be shown 
schematically in Figure 17.3. 


"Technically, Bọ is the partial derivative of Y with respect to X, B that with respect to X}, 82 that with respect to X, and 
so forth. Symbolically, aY,/aX, = By. j 
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~ ——m -| Rs. 50,000 


Rs. 40,960 Rs. 40,000 


Rs. 51,200 


Rs. 30,000 
Rs. 64,000 


Rs. 20,000 


Rs, 80,000 


Rs. 1,00,000 


Initial 1 Be SS E7 Final 
Rs. 1,00,000 Stages in expansion 


Figure 17.3 Cumulative expansion in bank deposits (initial reserve Rs. 1,00,000 and 20 percent reserve require- 
ment). 


Example 17.3 Link between Money and Prices 


According to the monetarists, inflation is essentially a monetary phenomenon in the sense that a continuous 
increase in the general price level is due to the rate of expansion in money supply far in excess of the amount 
of money actually demanded by the economic units. Of course, this link between inflation and changes in 
money supply is not instantaneous. Studies have shown that the lag between the two is anywhere from 3 
to about 20 quarters. The results of one such study are shown in Table 17.1,7 where we see the effect of a 1 
percent change in the M1B money supply ( = currency + checkable deposits at financial institutions) is felt over 
a period of 20 quarters. The long-run impact of a 1 percent change in the money supply on inflation is about 
1 (= Sm), which is statistically significant, whereas the short-run impact is about 0.04, which is not significant, 
although the intermediate multipliers seem to be generally significant. Incidentally, note that since P and M 
are both in percent forms, the m, (B; in our usual notation) give the elasticity of P with respect to M, that is, 
the percent response of prices to a 1 percent increase in the money supply. Thus, mo = 0.041 means that for a 
1 percent increase in the money supply the short-run elasticity of prices is about 0.04 percent. The long-term 
elasticity is 1 .03 percent, implying that in the long run a 1 percent increase in the money supply is reflected 
by just about the same percentage increase in the prices. In short, a 1 percent increase in the money supply 
is accompanied in the long run by a 1 percent increase in the inflation rate. 


Table 17.1 Estimate of Money—Price Equation: Original Specification 
Sample period: 1955-I to 1969-IV: m2; = 0 


À 20 ; 
P = —0.146 + y mi Mi 


i=0 
(0.395) 
Coeff. iti Coeff. Itl Coeff. iti 
mo 0.041 1.276 mg 0.048 3.249 me 0.069 3.943 
m 0.034 1.538 m 0.054 3.783 m7 0.062 3.712 


(Contd) 
2Keith M. Carlson, “The Lag from Money to Prices,” Review, Federal Reserve Bank of St. Louis, October 1980, Table 1, p. 4. 


656 Basic Econometrics 


(Contd) 
m2 0.030 1.903 - mo 0.059 4.305 mg 0.053 3.511 
m3 0.029 2.171 mı 0.065 4.673 mg 0.039 3.338 
m4 0.030 2.235 m2 0.069 4.795 ma 0.022 3.191 
ms 0.033 2.294 m3 0.072 4.694 Xom; 1.031 7.870 
me 0.037 2.475 m4 0.073 4.468 Mean lag 10.959 5.634 
m7 0.042 2.798 mıs 0.072 4.202 


R? 0.525 se 1.066 D.W. 2.00 


Notation: P = compounded annual rate of change of GNP deflator. 
M = compounded annual rate of chahge of MIB. 


Source: Keith M. Carlson, “The Lag from Money to Prices,” Review, Federal Reserve Bank of St. Louis, October 1980, Table 1, p. 4. 


Example 17.4 Lag between R&D Expenditure and Productivity 


The decision to invest in research and development (R&D) expenditure and its ultimate payoff in terms 
of increased productivity involve considerable lag, actually several lags, such as, “... the lag between the 
investment of funds and the time inventions actually begin to appear, the lag between the invention of an 
idea or device and its development up to a commercially applicable stage, and the lag which is introduced 
by the process of diffusion: it takes time before all the old machines are replaced by the better new ones.”? 


Example 17.5 The J Curve of International Economics 


Students of international economics are familiar with what is called the J curve, which shows the relationship 
between trade balance and depreciation of currency. Following depreciation of a country’s currency (e.g., due 
to devaluation), initially the trade balance deteriorates but eventually it improves, assuming other things are 
the same. The curve is as shown in Figure 17.4. 


Current account 
(in domestic output units) 


Long-run effect of 
real depreciation 
on the current 
account 


Time 


Real depreciation takes End of 
place and J curve begins J curve 


Figure 17.4 The J curve. 


Source: Paul R. Krugman and Maurice Obstfeld, International Economics: Theory and Practice, 3d ed., Harper Collins, New York. 1994, p. 465, 


3Zvi Griliches, “Distributed Lags: A Survey,” Econometrica, vol. 36, no. 1, January 1967, pp. 16-49. 
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Example 17.6 The Accelerator Mode! of Investment 


In its simplest form, the acceleration principle of investment theory states that investment is proportional to 
changes in output. Symbolically, 


lt = B(X: — X1) B>O (17.1.5) 


where /, is investment at time t, X, is output at time t, and X, is output at time (t - 1). 


The preceding examples are only a sample of the use of lag in economics. Undoubtedly, the reader can 
produce several examples from his or her own experience. 


17.2 The Reasons for Lags* 


Although the examples cited in Section 17.1 point out the nature of lagged phenomena, they do not fully 
explain why lags occur. There are three main reasons: 


1. Psychological reasons. As a result of the force of habit (inertia), people do not change their consumption 
habits immediately following a price decrease or an income increase perhaps because the process 
of change may involve some immediate disutility. Thus, those who become instant millionaires by 
winning lotteries may not change the lifestyles to which they were accustomed for a long time because 
they may not know how to react to such a windfall gain immediately. Of course, given reasonable 
time, they may learn to live with their newly acquired fortune. Also, people may not know whether a 
change is “permanent” or “transitory.” Thus, my reaction to an increase in my income will depend on 
whether or not the increase is permanent. If it is only a nonrecurring increase and in succeeding periods 
my income returns to its previous level, I may save the entire increase, whereas someone else in my 
position might decide to “live it up.” 

2. Technological reasons. Suppose the price of capital relative to labor declines, making substitution of 
capital for labor economically feasible. Of course, addition of capital takes time (the gestation period). 
Moreover, if the drop in price is expected to be temporary, firms may not rush to substitute capital for 
labor, especially if they expect that after the temporary drop the price of capital may increase beyond 
its previous level. Sometimes, imperfect knowledge also accounts for lags. At present the market for 
personal computers is glutted with all kinds of computers with varying features and prices. Moreover, 
since their introduction in the late 1970s, the prices of most personal computers have dropped dramati- 
cally. As a result, prospective consumers for the personal computer may hesitate to buy until they have 
had time to look into the features and prices of all the competing brands. Moreover, they may hesitate 
to buy in the expectation of further decline in price or innovations. 

3. Institutional reasons. These reasons also contribute to lags. For example, contractual obligations may 
prevent firms from switching from one source of labor or raw material to another. As another example, 
those who have placed funds in long-term savings accounts for fixed durations such as one year, three 
years, or seven years are essentially “locked in” even though money market conditions may be such that 
higher yields are available elsewhere. Similarly, employers often give their employees a choice among 
several health insurance plans, but once a choice is made, an employee may not switch to another plan 
for at least one year. Although this may be done for administrative convenience, the employee is locked 
in for one year. 


4This section leans heavily on Marc Nerlove, Distributed Lags and Demand Analysis for Agricultural and Other Commodities, 
Agricultural Handbook No. 141, U.S. Department of Agriculture, June 1958. 
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For the reasons just discussed, lag occupies a central role in economics. This is clearly reflected in the 
short-run—long-run methodology of economics. It is for this reason we say that short-run price or income 
elasticities are generally smaller (in absolute value) than the corresponding long-run elasticities or that 
short-run marginal propensity to consume is generally smaller than long-run marginal propensity to consume. 


17.3 Estimation of Distributed-Lag Models 


Granted that distributed-lag models play a highly useful role in economics, how does one estimate Preh 
models? Specifically, suppose we have the following distributed-lag model in one explanatory variable:* 


Y, =a + Boxe Bia Pee ee (73-1) 


where we have not defined the length of the lag, that is, how far back into the past we want to go. Such a 
model is called an infinite (lag) model, whereas a model of the type shown in Eq. (17.1.2) is called a finite 
(lag) distributed-lag model because the length of the lag k is specified. We shall continue to use Eq. (17.3.1) 
because it is easy to handle mathematically, as we shall see.° 

How do we estimate the a and 8’s of Eq. (17.3.1)? We may adopt two approaches: (1) ad hoc estimation 
and (2) a priori restrictions on the B’s by assuming that the B’s follow some systematic pattern. We shall 
consider ad hoc estimation in this section and the other approach in Section 17.4. 


Ad Hoc Estimation of Distributed-Lag Models 


Since the explanatory variable X, is assumed to be nonstochastic (or at least uncorrelated with the distur- 
bance term u,), X,_;, X;_7, and so on, are nonstochastic, too. Therefore, in principle, the ordinary least squares 
(OLS) can be applied to Eq. (17.3.1). This is the approach taken by Alt’ and Tinbergen.® They suggest that 
to estimate Eq. (17.3.1) one may proceed sequentially; that is, first regress Y, on X, then regress Y, on X , and 
X,_;, then regress Y, on X,, X,_; and X,_, and so on. This sequential procedure stops when the regression coeffi- 
cients of the lagged variables start becoming statistically insignificant and/or the coefficient of at least one of 
the variables changes signs from positive to negative or vice versa. Following this precept, Alt regressed fuel 
oil consumption Y on new orders X. Based on the quarterly data for the period 1930-1939, the results were 
as follows: 


PBT 0.191% 

Ŷ, = 8.27 + 0.111X; + Oued 

AEO Oia EE N 

AA E A OS. C Ar I EIA 


w 


Alt chose the second regression as the “best” one because in the last two equations the sign of X,» was not 
stable and in the last equation the sign of X,_, was negative, which may be difficult to interpret economically. 


SIf there is more than one explanatory variable in the model, each variable may have a lagged effect on Y. For simplicity 
only, we assume one explanatory variable. 


éin practice, however, the coefficients of the distant X values are expected to have a negligible effect on Y. 
7F, F. Alt, “Distributed Lags,” Econometrica, vol. 10, 1942, pp. 113-128. 
8). Tinbergen, “Long-Term Foreign Trade Elasticities,” Metroeconomica, vol. 1, 1949, pp. 174-185. 
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Although seemingly straightforward, ad hoc estimation suffers from many drawbacks, such as the 
following: 


1. There is no a priori guide as to what is the maximum length of the lag.” 

2. As one estimates successive lags, there are fewer degrees of freedom left, making statistical inference 
somewhat shaky. Economists are not usually that lucky to have a long series of data so that they can go 
on estimating numerous lags. 

3. More importantly, in economic time series data, successive values (lags) tend to be highly correlated: 
hence multicollinearity rears its ugly head. As noted in Chapter 10, multicollinearity leads to imprecise 
estimation; that is, the standard errors tend to be large in relation to the estimated coefficients. As a 
result. based on the routinely computed ¢ ratios, we may tend to declare (erroneously), that a lagged 
coefficient(s) is statistically insignificant. 

4. The sequential search for the lag length opens the researcher to the charge of data mining. Also, as we 
noted in Section 13.4, the nominal and true level of significance to test statistical hypotheses becomes 
an important issue in such sequential searches (see Eq. [13.4.2]). 


In view of the preceding problems, the ad hoc estimation procedure has very little to recommend it. 
Clearly, some prior or theoretical considerations must be brought to bear upon the various 8’s if we are to 
make headway with the estimation problem. 


17.4 The Koyck Approach to Distributed-Lag Models 


Koyck has proposed an ingenious method of estimating distributed-lag models. Suppose we start with the 
infinite lag distributed-lag model (17.3.1). Assuming that the B’s are all of the same sign, Koyck assumes that 
they decline geometrically as follows.!° 


Bx = Bork = aa (17.4.1)! 


where A, such that 0 < A < 1, is known as the rate of decline, or decay, of the distributed lag and where | — A 
is known as the speed of adjustment. 

What Eq. (17.4.1) postulates is that each successive 6 coefficient is numerically less than each preceding 
B (this statement follows since A < 1), implying that as one goes back into the distant past, the effect of 
that lag on Y, becomes progressively smaller, a quite plausible assumption. After all, current and recent past 
incomes are expected to affect current consumption expenditure more heavily than income in the distant past. 
Geometrically, the Koyck scheme is depicted in Figure 17.5. 

As this figure shows, the value of the lag coefficient B, depends, apart from the common By, on the value 
of À. The closer A is to 1, the slower the rate of decline in B,, whereas the closer it is to zero, the more rapid 
the decline in 6, In the former case, distant past values of X will exert sizable impact on Y,, whereas in the 
latter case their influence on Y, will peter out quickly. This pattern can be seen clearly from the following 
illustration: 


if the lag length, k, is incorrectly specified, we will have to contend with the problem of misspecification errors discussed 
in Chapter. 13. Also keep in mind the warning about data mining. 
10L, M. Koyck, Distributed Lags and Investment Analysis, North Holland Publishing Company, Amsterdam, 1954. 


11Sometimes this is also written as 


Ceol —A)ak R= ON 


for reasons given in footnote 12. 
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Bx 


Lag (time) 
0 


Figure 17.5 Koyck scheme (declining geometric distribution). 


a | Bo By B2 B3 Ba Bs = Bio 
0.75 | Bo 0.7589 0.5689 0.4289 0.328) 0.248) +: —0.06Bo 
0.25 | Bo 0.2589 0.068) 0.028) 0.00489 0.00180 --- 0.0 


Note these features of the Koyck scheme: (1) By assuming nonnegative values for A, Koyck rules out the 
B’s from changing sign; (2) by assuming A < 1, he gives lesser weight to the distant 6’s than the current ones; 
and (3) he ensures that the sum of the f’s, which gives the long-run multiplier, is finite, namely, 


= m 1 12 
3 Br = Bo ( = -) (17.4.2) 


As a result of Eq. (17.4.1), the infinite lag model (17.3.1) may be written as 


Y, = æ + BoX; + Boà Xi- + Bor? X2 +--+ + uy (17.4.3) 


As it stands, the model is still not amenable to easy estimation since a large (literally infinite) number of 
parameters remain to be estimated and the parameter A enters in a highly nonlinear form: Strictly speaking, 
the method of linear (in the parameters) regression analysis cannot be applied to such a model. But now 
Koyck suggests an ingenious way out. He lags Eq. (17.4.3) by one period to obtain 


Voy o + Bo Xpay + Por Xp os Bok eee (17.4.4) 


12This is because 


>> Bk = BoC + A402 403 4-9 = Bo (5 L) 


since the expression in the parentheses on the right side is an infinite geometric series whose sum is 1/(1 — A) provided 0 


<A<1. In passing, note that if 6, is as defined in footnote 11, 2, = Bo(1 — A)/(1 — A) = Bo, thus ensuring that the weights 
(1 — A)A‘ sum to 1. 
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He then multiplies Eq. (17.4.4) by A to obtain 


AY,- = ha + ABoXs—1 + Bod? Xp-2 + BoA X3 +++ + Aus (17.4.5) 
Subtracting Eq. (17.4.5) from Eq. (17.4.3), he gets 
Y, —AY,_-; = a(l — A) + BoX, + (uy — àur) (17.4.6) 
or, rearranging, 
Y, = a(1—A)+ BoX; +å Y,-1 +v: (17.4.7) 


where v, = (u, — Au,_;), a moving average of u, and u,_;- 

The procedure just described is known as the Koyck transformation. Comparing Eq. (17.4.7) with Eq. 
(17.3.1), we see the tremendous simplification accomplished by Koyck. Whereas before we had to estimate 
a and an infinite number of `s, we now have to estimate only three unknowns: a, Bo, and A. Now there is no 
reason to expect multicollinearity. In a sense, multicollinearity is resolved by replacing X,_,, X,_>, ... , by a 
single variable. namely. Y,_, But note the following features of the Koyck transformation: 


1. We started with a distributed-lag mode] but ended up with an autoregressive model because Y,_, appears 
as one of the explanatory variables. This transformation shows how one can “convert” a distributed-lag 
model into an autoregressive model. 

The appearance of Y, is likely to create some statistical problems. Y,_, like Y,, is stochastic, which 

means that we have a stochastic explanatory variable in the model. Recall that the classical least- 

squares theory is predicated on the assumption that the explanatory variables either are nonstochastic 
or, if stochastic, are distributed independently of the stochastic disturbance term. Hence, we must find 

out if Y, satisfies this assumption. (We shall return to this point in Section 17.8.) 

3. In the original model (17.3.1) the disturbance term was u,, whereas in the transformed model it is v, = 
(u, — Au,_,). The statistical properties of v, depend on what is assumed about the statistical properties 
of u,, for, as shown later, if the original u,’s are serially uncorrelated, the v,’s are serially correlated. 
Therefore, we may have to face up to the serial correlation problem in addition to the stochastic explan- 
atory variable Y, , We shall do that in Section 17.8. 

4. The presence of lagged Y violates one of the assumptions underlying the Durbin—Watson d test. 
Therefore, we will have to develop an alternative to test for serial correlation in the presence of lagged 
Y. One alternative is the Durbin h test, which is discussed in Section 17.10. 


~ 


As we saw in Eq. (17.1.4), the partial sums of the standardized £, tell us the proportion of the long-run, 
or total, impact felt by a certain time period. In practice, though, the mean or median lag is often used to 
characterize the nature of the lag structure of a distributed- lag model. 


The Median Lag 


The median lag is the time required for the first half, or 50 percent, of the total change in Y following a unit 
sustained change in X. For the Koyck model, the median lag is as follows (see Exercise 17.6): 
pee ene e (17.4.8) 
oyck model: Median lag = in. A, 
Thus, if A = 0.2 the median lag is 0.4306, but if A = 0.8 the median lag is 3.1067. Verbally, in the former 
case 50 percent of the total change in Y is accomplished in less than half a period, whereas in the latter case 
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it takes more than 3 periods to accomplish the 50 percent change. But this contrast should not be surprising. 
for as we know, the higher the value of A the lower the speed of adjustment, and the lower the value of A the 
greater the speed of adjustment. 


The Mean Lag 
Provided all 6, are positive, the mean, or average, lag is defined as 


CO 
k 
do kpr (17.4.9) 
dio Êk 
which is simply the weighted average of all the lags involved, with the respective B coefficients serving as 


weights. In short, it is a lag-weighted average of time. For the Koyck model the mean lag is (see Exercise 
17.7) 


N 


Mean lag = 


À 
Koyck model: Mean lag = ce (17.4.10) 


Thus, if à = i, the mean lag is 1. 

From the preceding discussion it is clear that the median and mean lags serve as a summary measure of 
the speed with which Y responds to X. In the example given in Table 17.1 the mean lag is about 11 quarters, 
showing that it takes quite some time, on the average, for the effect of changes in the money supply to be felt 
on price changes. 


Example 17.7 Per Capita Personal Consumption Expenditure (PPCE) and Per Capita Personal 
Disposable Income (PPDI) 


This example examines PPCE in relation to PPDI, both expressed in 2000 dollars, for the United States for the 
period 1959-2006. As an illustration of the Koyck model, consider the data given in Table 17.2. Regression of 
PPCE on PPDI and lagged PPCE gives the results shown in Table 17.3. 

The consumption function in this table can be called the short-run consumption function. We will derive 
the long-run consumption function shortly. 

Using the estimated value of A, we can compute the distributed lag coefficients. If By = 0.2139, B, = 
(0.2139)(0.7971) = 0.1704, B, = (0.2139)(0.7971 )* = 0.0231, and so on, which are short- and medium- 
term multipliers. Finally, using Eq. (17.4.2), we can obtain the long-run multiplier, that is, the total impact 
of change in income on consumption after all lagged effects are taken into account, which in the present 


example becomes 
= 1 1 
Dih = wo; ~ :) = 0.2139) aA] ~ 1.0537 


In words, a sustained increase of a dollar in PPDI will eventually lead to about 1.05 dollars increase in PPCE, 
the immediate, or short-run impact being only 21 cents. 
The long-run consumption function can now be written as: 


PPCE; = —1247.1351 + 1.0537PPDI, 


This is obtained by dividing the short-run consumption function given in Table 17.3 by 0.2029 on both 
sides and dropping the lagged PPD! term.'? 


"In equilibrium all PPCE values will be the same. Therefore, PPCE, = PPCE, ,. Making this substitution, you should get the 
long-run consumption function. 
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Table 17.2 PPCE and PPDI, 1959-2006 


PRCE PPDI 


Year PPCE PPDI 
1959 8,776 9,685 1983 15,656 17,828 
1960 8,873 9,735 1984 16,343 19,011 
1961 8,873 9,901 1985 17,040 19,476 
1962 9,170 10,227 1986 17,570 19,906 
1963 9,412 10,455 1987 17,994 20,072 
1964 9,839 11,061 1988 18,554 20,740 
1965 10,331 11,594 1989 18,898 21,120 
1966 10,793 12,065 1990 19,067 21,281 
1967 10,994 12,457 1991 18,848 21,109 
1968 11,510 12,892 1992 19,208 21,548 
1969 11,820 13,163 1993 19,593 21,493 
1970 11,955 13,563 1994 20,082 21,812 
1971 12,256 14,001 1995 20,382 22,153 
1972 12,868 14,512 1996 20,835 22,546 
1973 1373741 15,345 1997 21,365 23,065 
1974 13,148 15,094 1998 22,183 24,131 
1975 13,320 15,291 1999 23,050 24,564 
1976 13,919 15,738 2000 23,860 25,469 
1977 14,364 16,128 2001 24,205 25,687 
1978 14,837 16,704 2002 24,612 26,217 
1979 15,030 16,931 2003 25,043 26,535 
1980 14,816 16,940 2004 25711 Pipe? 
1981 14,879 17,217 2005 26,277 27,436 
1982 14,944 17,418 2006 26,828 28,005 
Notes: PPCE = per capita personal consumption expenditure in chained 2000 dollars. 
PPDI = per capita personal disposable income in chained 2000 dollars. 
Source: Economic Report of the President, 2007, Table B-31. 
Table 17.3 
Dependent Variable: PPCE 
Method: Least Squares 
Sample (adjusted): 1960-2006 
Included observations: 47 after adjustments 
Coefficient SEd ETTOL t StCabIsStiC Prob. 
€ -252.9190 Wie sis) st 7 -1.607348 oaas 
PPDI 0 283389 0 0.070617 3.028892 0.0041 
PPCE (-1) 0.797146 0.073308 10.87389 0.0000 
R-squared 0.998216 Mean dependent var. 16691.28 
Adjusted R-squared 0.998134 S.D. dependent var. 52051873 
S.E. of regression 224.8504 Akaike info criterion 13.73045 
Sum squared resid. PIPES SUS) Schwarz criterion 13.84854 
Log likelihood -319.6656 Hannan-Quinn criter. 1377489 
F-statistic 12306.99 Durbin-Watson stat. 0.961921 
Diggathet) = ~sscooe 


Prob. (F-statistic) 


0.000000 


*The calculation of Durbin h is discussed in Section 17.10. 
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In the long run the marginal propensity to consume (MPC) is about 1. This means that when consumers 
have had time to adjust to a dollar’s increase in PPDI, they will increase their PPCE by almost a dollar. In 
the short run, however, as Table 17.3 shows, the MPC is only about 21 cents. What is the reason for such a 
difference between the short- and long- run MPC? 

The answer can be found in the median and mean lags. Given A = 0.7971, the median lag is 


log(2) _ log(2) 


= EE 
loga log(0.7971) 


and the mean lag is 
e E 0785 
1—àÀ 
It seems real PPCE adjusts to real PPDI with a substantial lag: Recall that the larger the value of A (between 0 


and 1), the longer it takes for the full impact of a change in the value of the explanatory variable to be felt on 
the dependent variable. 


17.5 Rationalization of the Koyck Model: The Adaptive Expectations 
Model 


Although very neat, the Koyck model (17.4.7) is ad hoc since it was obtained by a purely algebraic process; 
it is devoid of any theoretical underpinning. But this gap can be filled if we start from a different perspective. 
Suppose we postulate the following model: 


Y, = Bo + BiX; + u, (37.5.1) 
where Y = demand for money (real cash balances) - > 
X* = equilibrium, optimum, expected long-run or normal rate of interest 
u = error term 

Equation (17.5.1) postulates that the demand for money is a function of expected (i.e.. anticipated) rate of 
interest. 

Since the expectational variable X* is not directly observable, let us propose the following hypothesis 
about how expectations are formed: 


Xp — Xia = V(X — XP) > (17.5.2)"4 


a d 


where y, such that 0 < y = 1, is known as the coefficient of expectation. Hypothesis (17.5.2) is known as 
the adaptive expectation, progressive expectation, or error learning hypothesis, popularized by Cagan!> 
and Friedman.'° i 


What Eq. (17.5.2) implies is that “economic agents will adapt their expectations in the light of past 
experience and that in particular they will learn from their mistakes.” !” More specifically, Eq. (17.5.2) states 


'4Sometimes the model is expressed as 
Xp — Xe_y = yv(Xe-1 — XE 4) 
‘SP. Cagan, “The Monetary Dynamics of Hyperinflations,” in M. Friedman (ed.), Studies in the Quantity Theory of Money, 


University of Chicago Press, Chicago, 1956. 


16Milton Friedman, A Theory of the Consumption Function, National Bureau of Economic Research, Princeton University Press 
Princeton, NJ, 1957. i l 


17G. K. Shaw, Rational Expectations: An Elementary Exposition, St. Martin’s Press, New York, 1984, p. 25. 
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that expectations are revised each period by a fraction y of the gap between the current value of the variable 
and its previous expected value. Thus, for our model this would mean that expectations about interest rates 
are revised each period by a fraction y of the discrepancy between the rate of interest observed in the current 
period and what its anticipated value had been in the previous period. Another way of stating this would be 
to write Eq. (17.5.2) as 


X; = yX; + (1 -—y)X (17:53) 
which shows that the expected value of the rate of interest at time ż is a weighted average of the actual value 
of the interest rate at time ż and its value expected in the previous period, with weights of y and | — y, respec- 
tively. If y= 1, X¥ = X,, meaning that expectations are realized immediately and fully, that is, in the same 
time period. If, on the other hand, y = 0, X* = X*_,, meaning that expectations are static, that is, “conditions 
prevailing today will be maintained in all subsequent periods. Expected future values then become identified 
with current values.”!® 

Substituting Eq. (17.5.3) into Eq. (17.5.1), we obtain 


Y, = Bo + Pily X: + (1 — y)XP_y] + uy 
= Bot Biy Xı + Bil — y)Xý a + ur 


Now lag Eq. (17.5.1) one period, multiply it by 1 — y, and subtract the product from Eq. (17.5.4). After simple 
algebraic manipulations, we obtain 


Y, = ypo + yi X++(1—y)%-1 +u: — (1 — y Juri 
= ypo + ybiX: + (1 —y)Yi +v 


(17.5.4) 


(17.5.5) 


where v, = u, — (1 — y)u,_). 

Before proceeding any further, let us note the difference between Eq. (17.5.1) and Eq. (17.5.5). In the 
former, 8, measures the average response of Y to a unit change in X*, the equilibrium or long-run value of 
X. In Eq. (17.5.5), on the other hand, y8, measures the average response of Y to a unit change in the actual 
or observed value of X. These responses will not be the same unless, of course, y = 1, that is, the current 
and long-run values of X are the same. In practice, we first estimate Eq. (17.5.5). Once an estimate of y is 
obtained from the coefficient of lagged Y, we can easily compute B, by simply dividing the coefficient of X, 
(= YB) by y. 

The similarity between the adaptive expectations model (17.5.5) and the Koyck model (17.4.7) should be 
readily apparent although the interpretations of the coefficients in the two models are different. Note that like 
the Koyck model, the adaptive expectations model is autoregressive and its error term is similar to the Koyck 
error term. We shall return to the estimation of the adaptive expectations model in Section 17.8 and to some 
examples in Section 17.12. Now that we have sketched the adaptive expectations (AE) model, how realistic 
is it? It is true that it is more appealing than the purely algebraic Koyck approach, but is the AE hypothesis 
reasonable? In favor of the AE hypothesis one can say the following: 

It provides a fairly simple means of modelling expectations in economic theory whilst postulating a mode of 

behaviour upon the part of economic agents which seems eminently sensible. The belief that people learn from 

experience is obviously a more sensible starting point than the implicit assumption that they are totally devoid of 
memory, characteristic of static expectations thesis. Moreover, the assertion that more distant experiences exert 

a lesser effect than more recent experience would accord with common sense and would appear to be amply 

confirmed by simple observation.” 


'8ibid., pp. 19-20. 
“ibid., p. 27. 
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Until the advent of the rational expectations (RE) hypothesis, initially put forward by J. Muth and 
later propagated by Robert Lucas and Thomas Sargent, the AE hypothesis was quite popular in empirical 
economics. The proponents of the RE hypothesis contend that the AE hypothesis is inadequate because it 
relies solely on the past values of a variable in formulating expectations,”” whereas the RE hypothesis assumes 
that “individual economic agents use current available and relevant information in forming their expectations 
and do not rely purely upon past experience.””! In short, the RE hypothesis contends that “expectations are 
‘rational’ in the sense that they efficiently incorporate all information available at the time the expectation is 
formulated”? and not just the past information. 

The criticism directed by the RE proponents against the AE hypothesis is well-taken, although there are 
many critics of the RE hypothesis itself.” This is not the place to get bogged down with this rather heady 
material. Perhaps one could agree with Stephen McNees that, “At best. the adaptive expectations assumption 
can be defended only as a ‘working hypothesis’ proxying for a more complex, perhaps changing expectations 
formulation mechanism. ”* 


Example 17.8 Example 17.7 Revisited 


Since the Koyck transformation underlies the adaptive expectations model, the results presented in Table 17.3 
can also be interpreted in terms of Equation (17.5.5). Thus 7 Âo = —-252.9190; y ĝı = 0.21389, and (1— = 
0.797146. So the expectation coefficient 7 ~ 0.2028, and, following the preceding discussion about the AE 
model, we can say that about 20 percent of the discrepancy between actual and expected PPDI is eliminated 
within a year. 


17.6 Another Rationalization of the Koyck Model: The Stock 
Adjustment, or Partial Adjustment, Model 


The adaptive expectations model is one way of rationalizing the Koyck model. Another rationalization is 
provided by Marc Nerlove in the so-called stock adjustment or partial adjustment model (PAM).~> To 
illustrate this model, consider the flexible accelerator model of economic theory, which assumes that there is 
an equilibrium, optimal, desired, or long-run amount of capital stock needed to produce a given output under 
the given state of technology, rate of interest, etc. For simplicity assume that this desired level of capital Y* 
is a linear function of output X as follows: 


wv 


yr = bo + BX; +u; . (17.6.1) 


Since the desired level of capital is not directly observable, Nerlove postulates the following hypothesis, 
known as the partial adjustment, or stock adjustment, hypothesis: 


*°Like the Koyck model, it can be shown that, under AE, expectations of a variable are an exponentially weighted average 
of past values of that variable. 


71G. K. Shaw, op. cit., p. 47. For additional details of the RE hypothesis, see Steven M. Sheffrin, Rational Expectations, 
Cambridge University Press, New York, 1983. 


2Stephen K. McNees, “The Phillips Curve: Forward- or Backward-Looking?” New England Economic Review, July-August 
UIEXALY, fo}, 5X0): 


?3For a recent critical appraisal of the RE hypothesis, see Michael C. Lovell, “Test of the Rational Expectations Hypothesis,” 
American Economic Review, March 1966, pp. 110-124. 


*4Stephen K. McNees, op. cit., p. 50. 
*5Marc Nerlove, Distributed Lags and Demand Analysis for Agricultural and Other Commodities, op. cit. 
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YoY, u= 8(¥7 = Y%)-1) (17.6.2) 


where 6, such that 0 < 6 = 1, is known as the coefficient of adjustment and where Y- Y, m actual change 
and (Y* — Y¥;_1) = desired change. 

Since Y,- Y, the change in capital stock between two periods, is nothing but investment, Eq. (17.6.2) 
can alternatively be written as 


I, = 8(Y* — Y) (17.6.3) 


where I, = investment in time period t. 

Equation (17.6.2) postulates that the actual change in capital stock (investment) in any given time period 
t is some fraction 6 of the desired change for that period. If ô = 1, it means that the actual stock of capital 
is equal to the desired stock: that is, actual stock adjusts to the desired stock instantaneously (in the same 
time period). However. if 5 = 0, it means that nothing changes since actual stock at time f is the same as that 
observed in the previous time period. Typically, ô is expected to lie between these extremes since adjustment 
to the desired stock of capital is likely to be incomplete because of rigidity, inertia, contractual obligations, 
etc.—hence the name partial adjustment model. Note that the adjustment mechanism (17.6.2) alternatively 
can be written as 


Y, = 8Y" + (1 — 6) Y, - (17.6.4) 
showing that the observed capital stock at time t is a weighted average of the desired capital stock at that time 


and the capital stock existing in the previous time period, ô and (1 — ô) being the weights. Now substitution 
of Eq. (17.6.1) into Eq. (17.6.4) gives 


Y, = 6(Bo + Bi X, SF itty) q — ô) Y,—ı 
= bBo + 6B, X; + (1 — 5)¥,_1 + du; 


This model is called the partial adjustment model (PAM). 

Since Eq. (17.6.1) represents the long-run, or equilibrium, demand for capital stock, Eq. (17.6.5) can be 
called the short-run demand function for capital stock since in the short run the existing capital stock may 
not necessarily be equal to its long-run level. Once we estimate the short-run function (17.6.5) and obtain 
the estimate of the adjustment coefficient 6 (from the coefficient of Y,_,, we can easily derive the long-run 
function by simply dividing 68, and dB, by 6 and omitting the lagged Y term, which will then give Eq. 
(17.6.1). 

Geometrically, the partial adjustment model can be shown as in Figure 17.6. In this figure Y* is the 
desired capital stock and Y, the current actual capital stock. For illustrative purposes assume that 6 = 0.5. 
This implies that the firm plans to close half the gap between the actual and the desired stock of capital each 
period. Thus, in the first period it moves to Y», with investment equal to (Y, — Y,), which in turn is equal to 
half of (Y* — Y,). In each subsequent period it closes half the gap between the capital stock at the beginning 
of the period and the desired capital stock Y*. 


(17.6.5) 


26Some authors do not add the stochastic disturbance term ut to the relation (17.6.1) but add it to this relation, believing 
that if the former is truly an equilibrium relation, there is no scope for the error term, whereas the adjustment mechanism 
can be imperfect and may require the disturbance term. In passing, note that Eq. (17.6.2) is sometimes also written as 


Ve = Viet 00K e) 


27This is adapted from Figure 7.4 from Rudiger Dornbusch and Stanley Fischer, Macroeconomics, 3d ed., McGraw-Hill, 
New York, 1984, p. 216. 
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Capital stock 


0 Time 
Figure 17.6 The gradual adjustment of the capital stock. 


The partial adjustment model resembles both the Koyck and adaptive expectations models in that it is 
autoregressive. But it has a much simpler disturbance term: the original disturbance term u, multiplied by 
a constant 6. But bear in mind that although similar in appearance, the adaptive expectations and partial 
adjustment models are conceptually very different. The former is based on uncertainty (about the future 
course ofprices, interest rates, etc.), whereas the latter is due to technical or institutional rigidities, inertia, 
cost of change, etc. However, both of these rnodels are theoretically much sounder than the Koyck model. 

Since in appearance the adaptive expectations and partial adjustment models are indistinguishable, the y 
coefficient of 0.2028 of the adaptive expectations model can also be interpreted as the 6 coefficient of the 
stock adjustment model if we assume that the latter model is operative in the present case (i.e., it is the desired 
or expected PPCE that is linearly related to the current PDPI). 

The important point to keep in mind is that since Koyck, adaptive expectations, and stock adjustment 
models—apart from the difference in the appearance of the error term—yield the same final estimating 
model, a researcher must be extremely careful in telling the reader which model he or she is using and why. 
Thus, researchers must specify the theoretical underpinning of their model. 


*17.7 Combination of Adaptive Expectations and Partial Adjustment 
Models 
Consider the following model: 


= Bo + pı Xf +u (17.7.1) 


where Y;" = desired stock of capital and X* = expected level of output. 
Since neither Y“ nor X* are directly obsenvables one could use the partial adjustment mechanism for Y* 
and the adaptive expectations model for X* to arrive at the following estimating equation (see Exercise 17. 2): 


= Body + Piðy X: +[(1 — vy) + (1 — 8)]¥,-1 
—(1 — 6)(1 — y)¥%~-2 + [Su, — 81 — y)uy_y] (17.7.2) 
= do + 0X; + 2 Y;_) +03Y;_2 + v; 


*Optional. 
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where v, = 6[u, — (1 — y)u,_,]. This model too is autoregressive, the only difference from the purely adaptive 
expectations model being that Y, , appears along with Y,_, as an explanatory variable. Like Koyck and the AE 
models, the error term in Eq. (17.7.2) follows a moving average process. Another feature of this model is that 
although the model is linear in the @’s, it is nonlinear in the original parameters. 

A celebrated application of Eq. (17.7.1) has been Friedman’s permanent income hypothesis, which states 
that “permanent” or long-run consumption is a function of “permanent” or long-run income.” 

The estimation of Eq. (17.7.2) presents the same estimation problems as the Koyck or the AE model in 
that all these models are autoregressive with similar error structures. In addition, Eq. (17.7.2) involves some 
nonlinear estimation problems that we consider briefly in Exercise 17.10, but do not delve into in this book. 


17.8 Estimation of Autoregressive Models 


From our discussion thus far we have the following three models: 


Koyck 
Y, = a(l — A) + BoX: +AY-1+ vı - (17.4.7) 
Adaptive expectations 
Y, = ypo + YPX + (A — VY + [u — (1 — yu] (17.5.5) 
Partial adjustment 
Y, = ôßo + 5BiX; + (1 — ô)Y,—1 + buy (17.6.5) 
All these models have the following common form: 
Y, = do + œX; + a2¥,1+ v; l (17.8.1) 


that is, they are all autoregressive in nature. Therefore, we must now look at the estimation problem of such 
models, because the classical least-squares theory may not be directly applicable to them. The reason is 
twofold: the presence of stochastic explanatory variables and the possibility of serial correlation. 

Now, as noted previously, for the application of the classical least-squares theory, it must be shown that 
the stochastic explanatory variable Y,_; is distributed independently of the disturbance term v,. To determine 
whether this is so, it is essential to know the properties of v, If we assume that the original disturbance term 
u, satisfies all the classical assumptions, such as E(u,) = 0, var (u,) = g” (the assumption of homoscedasticity), 
and cov (u, u,,,) = 0 for s # O (the assumption of no autocorrelation), v, may not inherit all these properties. 
Consider, for example, the error term in the Koyck model, which is v, = (u, — Au,_;). Given the assumptions 
about u„ we can easily show that v, is serially correlated because 


E(v;%-1) = —Ao? (17.8.2)? 
which is nonzero (unless A happens to be zero). And since Y,_; appears in the Koyck model as an explanatory 
variable, it is bound to be correlated with v, (via the presence of u,_, in it). As a matter of fact, it can be shown 
that 

cov [Y;—1, (us — Au;_1)] = —Ao? (17.8.3) 
which is the same as Eq. (17.8.2). The reader can verify that the same holds true of the adaptive expectations 
model. 


28Milton Friedman, A Theory of Consumption Function, Princeton University Press, Princeton, N.J., 1957. 
BE (Vve) = E (Up — AU 4 (Ua — Auta) 
= -AE(u,1)* since covariances between u’s are zero by assumption 


=-)o 
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What is the implication of the finding that in the Koyck model as well as the adaptive expectations 
model the stochastic explanatory variable Y,_, is correlated with the error term v,? As noted previously, if 
an explanatory variable in a regression model is correlated with the stochastic disturbance term, the 
OLS estimators are not only biased but also not even consistent; that is, even if the sample size is 
increased indefinitely, the estimators do not approximate their true population values.” Therefore, 
estimation of the Koyck and adaptive expectations models by the usual OLS procedure may yield 
seriously misleading results. 

The partial adjustment model is different, however. In this model v, = ôu, where 0 < 6 = 1. Therefore, if 
u, satisfies the assumptions of the classical linear regression model given previously, so will 6u, Thus, OLS 
estimation of the partial adjustment model will yield consistent estimates although the estimates tend to be 
biased (in finite or small samples).*! Intuitively, the reason for consistency is this: Although Y,_, depends on 
u, and all the previous disturbance terms, it is not related to the current error term u,. Therefore. as long as 
u, is serially independent, Y,_, will also be independent or at least uncorrelated with u, thereby satisfying an 
important assumption of OLS, namely, noncorrelation between the explanatory variable(s) and the stochastic 
disturbance term. 

Although OLS estimation of the stock, or partial, adjustment model provides consistent estimation 
because of the simple structure of the error term in such a model, one should not assume that it applies rather 
than the Koyck or adaptive expectations model.’ The reader is strongly advised against doing so. A model 
should be chosen on the basis of strong theoretical considerations, not simply because it leads to easy statis- 
tical estimation. Every model should be considered on its own merit, paying due attention to the stochastic 
disturbances appearing therein. If in models such as the Koyck or adaptive expectations model OLS cannot be 
straightforwardly applied, methods need to be devised to resolve the estimation problem. Several alternative 
estimation methods are available although some of them may be computationally tedious. In the following 
section we consider one such method. 


17.9 The Method of Instrumental Variables (IV) 


The reason why OLS cannot be applied to the Koyck or adaptive expectations model is that the explanatory 
variable Y, tends to be correlated with the error term v,. If somehow this correlation can be removed, one 
can apply OLS to obtain consistent estimates, as noted previously. (Note: There will be some small sample 
bias.) How can this be accomplished? Liviatan has proposed the following solution.** 

Let us suppose that we find a proxy for Y, , that is highly correlated with Y,_, but is unčorrelated with v,, 
where v, is the error term appearing in the Koyck or adaptive expectations model. Such a proxy is called an 
instrumental variable (IV).™ Liviatan suggests X _, as the instrumental variable for Y,_, and further suggests 
that the parameters of the regression (17.8.1) can be obtained by solving the following normal equations: 


*°The proof is beyond the scope of this book and may be found in Griliches, op. cit., pp. 36-38. However, see Chapter 18 
for an outline of the proof in another context. See also Asatoshi Maeshiro, “Teaching Regressions with a Lagged Dependent 
Variable and Autocorrelated Disturbances,” The Journal of Economic Education, Winter 1996, vol. 27, no. 1, pp. 72-84. 
31For proof, see J. johnston, Econometric Methods, 3d ed., McGraw-Hill, New York, 1984, pp. 360-362. See also H. E. Doran 
and J. W. B. Guise, Single Equation Methods in Econometrics: Applied Regression Analysis, University of New England Teaching 
Monograph Series 3, Armidale, NSW, Australia, 1984, pp. 236~244. 


32Also, as J. Johnston notes (op. cit., p. 350), “[the] pattern of adjustment {suggested by the partial adjustment model] ... 
may sometimes be implausible.” 


33N. Liviatan, “Consistent Estimation of Distributed Lags,” International Economic Review, vol. 4, January 1963, pp. 44-52. 
34Such instrumental variables are used frequently in simultaneous equation models (see Chapter 20). 
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Dando + Xi +h YoY, | 
D nX = Go DIX +1 YX? + VON, (17.9.1) 


es Y,X;-1 = aod A +â S +â `y YEA i 


Notice that if we were to apply OLS directly to Eq. (17.8.1), the usual OLS normal equations would be (see 


Section 7.4): 
YY, = nao t+ a Yx +@ >) Yı 
» YX; = do )) X, +â; X X +h e Eoi (17.9.2) 


> es =d0 oY + ay eo = +â Y}, 


The difference between the two sets of normal equations should be readily apparent. Liviatan has shown 
that the a’s estimated from Eq. (17.9.1) are consistent, whereas those estimated from Eq. (17.9.2) may not 
be consistent because Y,_, and v, [= u, = Au,_, or u, — (1 — y)u,_,] may be correlated whereas X, and X,_, are 
uncorrelated with v,. (Why?) 

Although easy to apply in practice once a suitable proxy is found, the Liviatan technique is likely to suffer 
from the multicollinearity problem because X, and X,_, which enter in the normal equations of (17.9.1), 
are likely to be highly correlated (as noted in Chapter 12, most economic time series typically exhibit a 
high degree of correlation between successive values). The implication, then, is that although the Liviatan 
procedure yields consistent estimates, the estimators are likely to be inefficient.” 

Before we move on, the obvious question is: How does one find a “good” proxy for Y,_ in such a way that. 
although highly correlated with Y,_, it is uncorrelated with v,? There are some suggestions in the literature, 
which we take up by way of an exercise (see Exercise 17.5). But it must be stated that finding good proxies is 
not always easy, in which case the IV method is of little practical use and one may have to resort to maximum 
likelihood estimation techniques, which are beyond the scope of this book.*© 

Is there a test one can use to find out if the chosen instrument(s) is valid? Dennis Sargan has developed a 
test, dubbed the SARG test, for this purpose. The test is described in Appendix 17A, Section 17A.1. 


17.10 Detecting Autocorrelation in Autoregressive Models: Durbin h 
Test 


As we have seen, the likely serial correlation in the errors v, make the estimation problem in the autoregressive 
model rather complex: In the stock adjustment model the error term v, did not have (first-order) serial corre- 
lation if the error term u, in the original model was serially uncorrelated, whereas in the Koyck and adaptive 
expectations models v, was serially correlated even if u, was serially independent. The question, then, is: How 
does one know if there is serial correlation in the error term appearing in the autoregressive models? 

As noted in Chapter 12, the Durbin—Watson d statistic may not be used to detect (first-order) serial corre- 
lation in autoregressive models, because the computed d value in such models generally tends toward 2, which 


35To see how the efficiency of the estimators can be improved, consult Lawrence R. Klien, A Textbook of Econometrics, 2d 
ed., Prentice-Hall, Englewood Cliffs, NJ., 1974, p. 99. See also William H. Greene, Econometric Analysis, Macmillan, 2d ed., 
New York, 1993, pp. 535-538. 

36For a condensed discussion of the ML methods, see J. Johnston, op. cit., pp. 366-371, as well as Appendix 4A and 
Appendix 1 5A. 
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is the value of d expected in a truly random sequence. In other words, if we routinely compute the d statistic 
for such models, there is a built-in bias against discovering (first-order) serial correlation. Despite this, many 
researchers compute the d value for want of anything better. However, Durbin himself has proposed a large- 
sample test of first-order serial correlation in autoregressive models.” This test is called the h statistic. 

We have already discussed the Durbin A test in Exercise 12.36. For convenience, we reproduce the h 
statistic (with a slight change in notation): 


ee n (17.10.1) 
4 = E TET 


where n is the sample size, var (2) is the variance of the lagged Y,(= Y,_,) coefficient in Eq. (17.8.1), and p 
is an estimate of the first-order serial correlation p, first discussed in Chapter 12. 

As noted in Exercise 12.36, for a large sample, Durbin has shown that, under the null hypothesis that p = 
0, the h statistic of Eq. (17.10.1) follows the standard normal distribution. That is, 


Rasy ~ N OTDI (17.10.2) 


where asy means asymptotically. 
In practice, as noted in Chapter 12, one can estimate p as 


prl— (17.10.3) 
It is interesting to observe that although we cannot use the Durbin d to test for autocorrelation in autore- 
gressive models, we can use it as an input in computing the A statistic. 
Let us illustrate the use of the A statistic with our Example 17.7. In this example, n = 47, pre. a) 
= 0.5190 (Note: d = 0.9619), and var (2) = var(PPCE,_,) = (0.0733)* = 0.0053. Putting these values in Eq. 
(17.10.1), we obtain: - i 


47 

A= 05190 1 —47(0.0053) > 4.1061 (17.10.4) 
Since this h value has the standard normal distribution under the null hypothesis, the probability of obtaining 
such a high h value is very small. Recall that the probability that a standard normal variate exceeds the value 
of +3 is extremely small. In the present example our conclusion, then, is that there is (positive) autocorre- 
lation. Of course, bear in mind that h follows the standard normal distribution asymptotically. Our sample of 
47 observations is reasonably large. = 

Note these features of the A statistic. 


1. It does not matter how many X variables or how many lagged values of Y are included in the regression 
model. To compute h, we need consider only the variance of the coefficient of lagged Y,_,. 

2. The test is not applicable if [n var (&2)] exceeds 1. (Why?) In practice, though, this does not usually 
happen. 

3. Since the test is a large-sample test, its application in small samples is not strictly justified, as shown 
by Inder”? and Kiviet.*? It has been suggested that the Breusch—Godfrey (BG) test, also known as the 


37). Durbin, “Testing for Serial Correlation in Least-Squares Regression When Some of the Regressors Are Lagged Dependent 
Variables,” Econometrica, vol. 38, 1970, pp. 410-421. 


38B, Inder, “An Approximation to the Null Distribution of the Durbin—-Watson Statistic in Models Containing Lagged Depen- 
dent Variables,” Econometric Theory, vol. 2, no. 3, 1986, pp. 413-428. 


39), F, Kiviet, “On the Vigour of Some Misspecification Tests for Modelling Dynamic Relationships,” Review of Economic Stud- 
ies, vol. 53, no. 173, 1986, pp. 241-262. 
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Lagrange multiplier test, discussed in Chapter 12 is statistically more powerful not only in the large 
samples but also in finite, or small, samples and is therefore preferable to the A test.” 


The conclusion based on the h test that our model suffers from autocorrelation is confirmed by the 
Breusch-Godfrey (BG) test, which is shown in Equation (12.6.17). Using the seven lagged values of the 
residuals estimated from the regression shown in Table 17.3, the BG test shown in Eq. (12.6.18) obtained a 
x’ value of 15.3869. For seven degrees of freedom (the number of lagged residuals used in the BG test), the 
probability of obtaining a chi-square value of as much as 15.38 or greater is about 3 percent, which is quite 
low. 

For this reason, we need to correct the standard errors shown in Table 17.3, which can be done by the 
Newey—West HAC procedure discussed in Chapter 12. The results are as shown in Table 17.4. 

It seems OLS underestimates the standard errors of the regression coefficients. 


Table 17.4 


Dependent Variable: PCE 

Method: Least Squares 

Sample (adjusted): 1960-2006 

Included observations: 47 after adjustments 

Newey-West HAC Standard Errors & Covariance (lag truncation = 3) 


Coefficient Std. Error t Statistic Prob. 


ra -2529190 168.4610 -1,501350. 0.1404 
PPDI 0) 5 213890 OR OSes 4.173888 0.0001 
PERCE 1) 0.797146 OOS TIS) As) T5. 38 TAE 0.0000 
R-squared 0.998216 Mean dependent var. T6691m28 
Adjusted R-squared 0.998134 S.D. dependent var. 52057873 
S.E. of regression 224.8504 Akaike info criterion 13.73045 
Sum squared resid. 22245397 Schwarz criterion 13.84854 
Log likelihood -319.6656 Hannan-Quinn criter. 13.77489 
F-statistic 12206799 Durbin-Watson stat. 0.961921 


Prob. (F-statistic) 0.000000 


17.11 A Numerical Example: The Demand for Money in Canada, 
1979-1 to 1988-IV 


To illustrate the use of the models we have discussed thus far, consider one of the earlier empirical applica- 
tions, namely, the demand for money (or real cash balances). In particular, consider the following model.*! 


M* = BoR’ YP e" (17.11.1) 


40Gabor Korosi, Laszlo Matyas, and Istvan P. Szekely, Practical Econometrics, Ashgate Publishing Company, Brookfield, 
Vermont, 1992, p. 92. 

41For a similar model, see Gregory C. Chow, “On the Long-Run and Short-Run Demand for Money,” Journal of Political 
Economy, vol. 74, no. 2, 1966, pp. 111-131. Note that one advantage of the multiplicative function is that the exponents 
of the variables give direct estimates of elasticities (see Chapter 6). 
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where M* = desired, or long-run, demand for money (real cash balances) 
R, = long-term interest rate, % 
Y, = aggregate real national income 


For statistical estimation, Eq. (17.11.1) may be expressed conveniently in log form as 
In M* = In fo + Bi INR; + Bo In Y, +u; (17.11.2) 


Since the desired demand variable is not directly observable, let us assume the stock adjustment hypothesis, 
namely, 


ô 
Me ( cal ) DEES (17.11.3) 
Mii M,-1 


Equation (17.11.3) states that a constant percentage (why?) of the discrepancy between the actual and desired 
real cash balances is eliminated within a single period (year). In log form, Eq. (17.11.3) may be expressed as 


In M, — In M,_-, = 6(In Mf — In M;_1) (17.11.4) 
Substituting In M* from Eq. (17.11.2) into Eq. (17.11.4) and rearranging, we obtain 
In M, = ô In Bo + B16 In R: + b28 1n Y, + (1 — 4) In M;_; + 6u, (17.11.5)? 


which may be called the short-run demand function for money. (Why?) 

As an illustration of the short-term and long-term demand for real cash balances, consider the data given in 
Table 17.5. These quarterly data pertain to Canada for the period 1979 to 1988. The variables are defined as 
follows: M [as defined by M1 money supply, Canadian dollars (C$), millions], P (implicit price deflator. 1981 
= 100), GDP at constant 1981 prices (C$, millions), and R (90-day prime corporate rate of interest, %).3 
M1 was deflated by P to obtain figures for real cash balances. A priori, real money demand is expected to be 
positively related to GDP (positive income effect) and negatively related to R (the higher the interest rate, the 
higher the opportunity cost of holding money, as M1 money pays very little interest, if any). 

The regression results were as follows:“ 


in M, = 0.8561 — 0.06341nR;— 0.0237 in GDP,+ 0.9607 lIn M,—ı 


se = (0.5101) (0.0131) (0.0366) (0.0414) 
t = (1.6782) (—4.8134) (—0.6466) (23.1972) = 
R? = 0.9482 d = 2.4582 F = 213.7234 (17.11.6) 


The estimated short-run demand function shows that the short-run interest elasticity has the correct sign 
and that it is statistically quite significant, as its p value is almost zero. The short-run income elasticity is 


42in passing, note that this model is essentially nonlinear in the parameters. Therefore, although OLS may give an unbiased 
estimate of, say, 64 ô taken together, it may not give unbiased estimates of 8, and ô individually, especially if the sample 
is small. 


These data are obtained from B. Bhaskar Rao, ed., Cointegration for the Applied Economist, St. Martin’s Press, New York, 
1994, pp. 210-213. The original data is from 1956-1 to 1988-IV, but for illustration purposes we begin our analysis from 
the first quarter of 1979. 

“Note this f feature of the estimated standard errors. The standard error of, say, the coefficient of In R, refers to the standard 
error of 814, an estimator of 8,5. There is no simple way to obtain the standard errors of ĝų and ô individually from the 
standard error of B13, especially if the sample is relatively small. For large samples, however, individual standard errors 


of By and § can be obtained approximately, but the computations are involved. See Jan Kmenta, Elements of Econometrics, 
Macmillan, New York, 1971, p. 444. 


Table 17.5 Money, Intere 
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st Rate, Price Index, and GDP, Canada 


Observation 


1979-1 
1979-2 
1979-3 
1979-4 


1980-1 
1980-2 
1980-3 
1980-4 
1981-1 
1981-2 
1981-3 
1981-4 


1982-1 
1982-2 
1982-3 
1982-4 
1983-1 
1983-2 
1983-3 
1983-4 
1984-1 
1984-2 
1984-3 
1984-4 


1985-1 
1985-2 
1985-3 
1985-4 
1986-1 
1986-2 
1986-3 
1986-4 


1987-1 
1987-2 
1987-3 
1987-4 
1988-1 
1988-2 
1988-3 
1988-4 


M1 


22,175.00 
22,841.00 
23,461.00 
23,427.00 
23,811.00 
23,612.33 
24,543.00 
25,638.66 


25,316.00 
25,501.33 
25,382.33 
24,753.00 
25,094.33 
25,253.66 
24,936.66 
25,553.00 


26,755.33 
27,412.00 
28,403.33 
28,402.33 
28,715.66 
28,996.33 
28,479.33 
28,669.00 


29,018.66 
29,398.66 
30,203.66 
$1,059.33 


30,745.33 
30,477.66 
31,563.66 
32,800.66 


33,958.33 
35,795.66 
35,878.66 
36,336.00 
36,480.33 
37,108.66 
38,423.00 
38,480.66 


R 


138838 
11.16667 
11.80000 
14.18333 


14.38333 
12.98333 
10.71667 
14.53333 


17.13333 
18.56667 
21.01666 
16.61665 
15.35000 
16.04999 
14.31667 
10.88333 


9.616670 
9.316670 
9.333330 
9.550000 
10.08333 
11.45000 
12.45000 
10.76667 


10.51667 
9.666670 
9.033330 
9.016670 


11.03333 
8.733330 
8.466670 
8.400000 


7.250000 
8.300000 
9.300000 
8.700000 
8.616670 
9.133330 
10.05000 
10.83333 


P 


0.77947 
0.80861 
0.82649 
0.84863 


0.86693 
0.88950 
0.91553 
0.93743 
0.96523 
0.98774 


1 


.01314 


1.03410 
1.05743 


1 


.07748 


1.09666 


1 
1 
1 
1 
1 
1 
] 
1 
1 
1 


1 


.11641 


.12303 
13395 
.14721 
.16059 
AZANI 
.17406 
17795 
.18438 


.18990 
.20625 


1.21492 


1 
1 


.21805 
.22408 


1.22856 
1.23916 
1.25368 


1277 


1 
1 
1 


1 


1 


.28429 
29599 
.31001 
32325 
1: 
iP 
.36648 


33219 
35065 


GDP 


334,800 
336,708 
340,096 
341,844 
342,776 
342,264 
340,716 
347,780 


354,836 
359,352 
356,152 
353,636 
349,568 
345,284 
343,028 
340,292 
346,072 
353,860 
359,544 
362,304 
368,280 
376,768 
381,016 
385,396 
390,240 
391,580 
396,384 
405,308 
405,680 
408,116 
409,160 
409,616 
416,484 
422,916 
429,980 
436,264 
440,592 
446,680 
450,328 
453,516 


Notes: M1 = C$, millions. 


P = implicit price deflator (1981 = 100). 


R = 90-day prime corporate interest rate, %. 
GDP = C$, millions (1981 prices). 


Source: Rao, op. cit., pp. 210-213. 
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surprisingly negative, although statistically it is not different from zero. The coefficient of adjustment is 6 = 
(1 — 0.9607) = 0.0393, implying that only about 4 percent of the discrepancy between the desired and actual 
real cash balances is eliminated in a quarter, a rather slow adjustment. 

_ To get back to the long-run demand function (17.11.2), all that needs to be done is to divide the short-run 
demand function through by 6 (why?) and drop the In M, term. The results are: 


in M7 = 21.7888 — 1.6132 1n R, — 0.6030 In GDP aran" 


As can be seen, the long-run interest elasticity of demand for money is substantially greater (in absolute 
terms) than the corresponding short-run elasticity, which is also true of the income elasticity, although in the 
present instance its economic and statistical significance is dubious. 

Note that the estimated Durbin-Watson d is 2.4582, which is close to 2. This substantiates our previous 
remark that in the autoregressive models the computed d is generally close to 2. Therefore, we should not 
trust the computed d to find out whether there was serial correlation in our data. The sample size in our case is 
40 observations, which may be reasonably large to apply the A test. In the present case, the reader can verify 
that the estimated h value is —1.5008, which is not significant at the 5 percent level, perhaps suggesting that 
there is no first-order autocorrelation in the error term. 


17.12 Illustrative Examples 


In this section we present a few examples of distributed lag models to show how researchers have used them 
in empirical studies. 


Example 17.9 The Fed and the Real Rate of Interest 


To assess the effect of M, (currency + checkable deposits) growth on Aaa bond real interest rate measure, 
G. J. Santoni and Courtenay C. Stone*® estimated, using monthly data, the following distributed lag model 
for the United States. 


u 
fte = constant + Da a;M—i + üi (17.12.1) 
i=0 
where r, = Moody's Index of Aaa bond yield minus the average annual rate of change in the seasonally 
adjusted consumer price index over the prior 36 months, which is used as the measure of real interest rate, 
and M; = monthly M; growth. 

According to the “neutrality of money doctrine,” real economic variables—such as output, employment, 
economic growth, and the real rate of interest—are not influenced permanently by money growth and, 
therefore, are essentially unaffected by monetary policy. ... Given this argument, the Federal Reserve has no 
permanent influence over the real rate of interest whatsoever.*” 

if this doctrine is valid, then one should expect the distributed lag coefficients a, as well as their sum to be 
statistically indifferent from zero. To find out whether this is the case, the authors estimated Eq. (17.12.1) for 
two different time periods, February 1951 to September 1979 and October 1979 to November 1982, the 
latter to take into account the change in the Fed’s monetary policy, which since October 1979 has paid more 
attention to the rate of growth of the money supply than to the rate of interest, which was the policy in the 
earlier period. Their regression results are presented in Table 17.6. The results seem to support the “neutrality 


“SNote that we have not presented the standard errors of the estimated coefficients for reasons discussed in footnote 44. 
46”The Fed and the Real Rate of Interest,” Review, Federal Reserve Bank of St. Louis, December 1982, pp. 8-18. 
47 1p 

Ibid. p. 15. 


Dynamic Econometric Models: Autoregressive and Distributed-Lag Models 677 


of money doctrine,” since for the period February 1951 to September 1979 the current as well as lagged 
money growth had no statistically significant effect on the real interest rate measure. For the latter period, too, 
the neutrality doctrine seems to hold since ea; is not statistically different from zero; only the coefficient a, is 
significant, but it has the wrong sign. (Why?) 

Se ae a ae ee ee 


Table 17.6 Influence of Monthly M1 Growth on an Aaa Bond Real Interest Rate Measure: February 1951 to November 


1982 
11 : 
r= constant + }` a;Mj,, 
i=0 
February 1951 to October 1979 to 
September 1979 November 1982 
Coefficient itl* Coefficient iti* 
Constant 1.48851 2.068 1.0360 0.801 
ao —0.00088 0.388 0.00840 1.014 
ay 0.00171 0.510 0.039607 3.419 
a2 0.00170 0.423 0.03112 2.003 
a3 0.00233 0.542 0.02719 1.502 
O4 —0.00249 0.553 0.00901 0.423 
Gs —0.00160 0.348 0.01940 0.863 
d6 0.00292 0.631 0.02411 1.056 
a7 0.00253 0.556 0.01446 0.666 
ag 0.00000 0.001 —0.00036 0.019 
ag 0.00074 0.181 —0.00499 0.301 
a10 0.00016 0.045 —0.01126 0.888 
a11 0.00025 0.107 —0.00178 0.211 
X a; 0.00737 0.221 0.1549 0.926 
R2 0.9826 0.8662 
D-W 2.07 2.04 
RHO1 127 24.536 1.40? 9.838 
RHO2 —0.28 5.410 —0.48" 3.373 
NOB 344. 38. 


SER ( = RSS) 0.1548 0.3899 


*|t| = absolute f value. 
tSignificantly different from zero at the 0.05 level. 


Source: G. J. Santoni and Courtenay C. Stone, “The Fed and the Real Rate of Interest,” Review, Federal Reserve Bank of St. Louis, 
December 1982, p. 16. 


Example 17.10 The Short- and Long-Run Aggregate Consumption for Sri Lanka, 1967—1993 


Suppose consumption C is linearly related to permanent income X*: 
C: = Bi + B2X¢ + ut (17.12.2) 
Since Xf is not directly observable, we need to specify the mechanism that generates permanent income. 


Suppose we adopt the adaptive expectations hypothesis specified in Eq. (17.5.2). Using Eq. (17.5.2) and 
simplifying, we obtain the following estimating equation (cf. 1 7.5.5): 


Ce = œ +02X¢+03Cr1+V (17.12.3) 
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where a, = YB; 
a = yB2 
az =(1 - y) 
v= [u - (1 - yur] ; ; ; 
As we know, > gives the mean response of consumption to, say, a $1 increase in permanent income, 
whereas a, gives the mean response of consumption to a $1 increase in current income. 
From annual data for Sri Lanka for the period 1967-1993 given in Table 17.7, the following 
regression results were obtained:*8 


C = 1038.403 + 0.4043X;+ 0.5009C_, 
se =(2501.455) (0.0919) (0.1213) (17.12.4) 
t= (0.4151) (4.3979) (4.1293) 
R? = 0.9912 d=1.4162 F=1298.466 


where C = private consumption expenditure, and X = GDP, both at constant prices. We also introduced real 
interest rate in the model, but it was not statistically significant. 


Table 17.7 Private Consumption Expenditure and GDP, Sri Lanka 


Observation PCON GDP l Observation PCON GDP 
1967 61,284 78,221 1981 120,477 152,846 
1968 68,814 83,326 1982 133,868 164,318 
1969 76,766 90,490 1983 148,004 172,414 
1970 73,576 92,692 1984 149,735 178,433 
1971 73,256 94,814 1985 155,200 185,753 
1972 67,502 92,590 1986 154,165 192,059 
1973 78,832 101,419 1987 155,445- 191,288 
1974 80,240 105,267 1988 157,199 196,055 
1975 84,477 112,149 1989 158,576 ~ 202,477 
1976 86,038 116,078 1990 169,238 223,225 
1977 96,275 122,040 1991 179,001 233,231 
1978 101,292 128,578 1992 183,687 242,762 
1979 105,448 136,851 1993 198,273 259,555 


1980 114,570 144,734 


Notes: PCON = private consumption expenditure. 
GDP = gross domestic product. 


Source: See footnote 48. 


The results show that the short-run marginal propensity to consume (MPC) is 0.4043, suggesting that 
a 1 rupee increase in the current or observed real income (as measured by real GDP) would increase mean 
consumption by about 0.40 rupee. But if the increase in income is sustained, then eventually the MPC out 
of the permanent income will be B, = yB2/y = 0.4043/0.4991 = 0.8100, or about 0.81 rupee. In other 
words, when consumers have had time to adjust to the 1 rupee change in income, they will increase their 
consumption ultimately by about 0.81 rupee. 

Now suppose that our consumption function were 


CË = Bi + BoXr+ uy (17.12.5) 


48The data are obtained from the data disk in Chandan Mukherjee, Howard White, and Marc Wuyts, Econometrics and Data 
Analysis for Developing Countries, Routledge, New York, 1998. The original data are from World Bank’s World Tables. 
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In this formulation permanent or long-run consumption Ç, is a linear function of the current or observed 
income. Since C; is not directly observable, let us invoke the partial adjustment model (17.6.2). Using this 
model, and after algebraic manipulations, we obtain 


Cr = 6By + 582X¢+(1 — ô)Ci-1 + bu; 


17.12.6 
= ar + 2X + 3C + Vt ( ) 


In appearance, this model is indistinguishable from the adaptive expectations model (17.12.3). Therefore, 
the regression results given in (17.12.4) are equally applicable here. However, there is a major difference in 
the interpretation of the two models, not to mention the estimation problem associated with the autore- 
gressive and possibly serially correlated model (17.12.3). The model (17.12.5) is the long-run, or equilibrium, 
consumption function, whereas the model (17.12.6) is the short-run consumption function. B, measures the 
long-run MPC, whereas «a, (= 582) gives the short-run MPC; the former can be obtained from the latter by 
dividing it by 5, the coefficient of adjustment. 

Returning to (17.12.4), we can now interpret 0.4043 as the short-run MPC. Since ô = 0.4991, the long-run 
MPC is 0.81. Note that the adjustment coefficient of about 0.50 suggests that in any given time period 
consumers only adjust their consumption one-half of the way toward its desired or long-run level. 

This example brings out the crucial point that in appearance the adaptive expectations and the partial 
adjustment models, or the Koyck model for that matter, are so similar that by just looking at the estimated 
regression, such as Eq. (17.12.4), one cannot tell which is the correct specification. That is why it is so vital 
that one specify the theoretical underpinning of the model chosen for empirical analysis and then proceed 
appropriately. If habit or inertia characterizes consumption behavior, then the partial adjustment model is 
appropriate. On the other hand, if consumption behavior is forward-looking in the sense that it is based on 
expected future income, then the adaptive expectations model is appropriate. If it is the latter, then, one will 
have to pay close attention to the estimation problem to obtain consistent estimators. In the former case, the 
OLS will provide consistent estimators, provided the usual OLS assumptions are fulfilled. 


17.13 The Almon Approach to Distributed-Lag Models: 
The Almon or Polynomial Distributed Lag (PDL)? 


Although used extensively in practice, the Koyck distributed-lag model is based on the assumption that the 
B coefficients decline geometrically as the lag lengthens (see Figure 17.5). This assumption may be too 
restrictive in some situations. Consider, for example, Figure 17.7. 

In Figure 17.7a it is assumed that the £’s increase at first and then decrease, whereas in Figure 17.7c it 
is assumed that they follow a cyclical pattern. Obviously, the Koyck scheme of distributed-lag models will 
not work in these cases. However, after looking at Figures 17.7a and c, it seems that one can express ĝ; as a 
function of i, the length of the lag (time), and fit suitable curves to reflect the functional relationship between 
the two, as indicated in Figures 17.7b and d. This approach is precisely the one suggested by Shirley Almon. 
To illustrate her technique, let us revert to the finite distributed-lag model considered previously, namely, 


Y, =a + BoX; + Br Xt-1 + BoX1-2 + -++ + BkXt-k + ur (171:2) 
which may be written more compactly as 
k 
Y¥,=a+) PiK (17.13.1) 
i=0 


49Shirley Almon, “The Distributed Lag between Capital Appropriations and Expenditures,” Econometrica, vol. 33, January 
1965, pp. 178-196. 
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B; 


Lag Lag 
(a) (b) 


Lag 


(c) (d) 
Figure 17.7 Almon polynomial-lag scheme. 


wv 


Following a theorem in mathematics known as Weierstrass’ theorem, Almon assumes that 8; can be 
approximated by a suitable-degree polynomial in i, the length of the lag.*° For instance, if the lag scheme 
shown in Figure 17.7a applies, we can write 


Bi = ao + aji + ai? (17.13.2) 


which is a quadratic, or second-degree, polynomial in i (see Figure 17.7b). However, if the B’s follow the 
pattern of Figure 17.7c, we can write 


Bi = a + ayi + ani? + ai? (17.13.3) 
which is a third-degree polynomial in i (see Figure 17.7d). More generally, we may write 
| 
Bj = ao + ayi + api? +-4-+a,i" (17.13.4) 


>°Broadly speaking, the theorem states that on a finite closed interval any continuous function may be approximated uni- 
formly by a polynomial of a suitable degree. 
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which is an mth-degree polynomial in i. It is assumed that m (the degree of the polynomial) is less than k (the 
maximum length of the lag). 


To explain how the Almon scheme works, let us assume that the B’s follow the pattern shown in Figure 


17.7a and, therefore, the second-degree polynomial approximation is appropriate. Substituting Eq. (17.13.2) 
into Eq. (17.13.1), we obtain 


k 
Y, =a + Ý (ao + ayi + ani?) X)-; +m; | 
psr ; , ad (17.13.5) 
=œ +a ‘> VER G SS iX i +a DE Pea y 
i =0 


A i=0 i=0 
Defining 


k 
Zor = > Me 
i0 


k 
ARNE (17.13.6) 
==) 


k 
Zo = > NE, 
i=) 


we may write Eq. (17.13.5) as 
Y, =a + aoZo + ai Zit + a2Za, + uy (17.13.7) 


In the Almon scheme Y is regressed on the constructed variables Z, not the original X variables. Note that 
Eq. (17.13.7) can be estimated by the usual OLS procedure. The estimates of @ and a; thus obtained will have 
all the desirable statistical properties provided the stochastic disturbance term u satisfies the assumptions of 
the classical linear regression model. In this respect, the Almon technique has a distinct advantage over the 
Koyck method because, as we have seen, the latter has some serious estimation problems that result from 
the presence of the stochastic explanatory variable Y, , and its likely correlation with the disturbance term. 

Once the a’s are estimated from Eq. (17.13.7), the original B’s can be estimated from Eq. (17.13.2) (or 
more generally from Eq. [17.13.4]) as follows: 


Bo = do 
Bi = âo + Gy + a 
Bo = âo + 24, + 44 
Ê; = Gy + 34, + 942 
By = âo + kay + kG, 
Before we apply the Almon technique, we must resolve the following practical problems. 


1. The maximum length of the lag k must be specified in advance. Here perhaps one can follow the advice 
of Davidson and MacKinnon: 


(17.13.8) 


The best approach is probably to settle the question of lag length first, by starting with a very large value of q 
{the lag length] and then seeing whether the fit of the model deteriorates significantly when it is reduced without 
imposing any restrictions on the shape of the distributed lag.>! 


51Russell Davidson and James C. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, 
1993, pp. 675-676. 
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Remember that if there is some “true” lag length, choosing fewer lags will lead to the “omission of relevant 
variable bias,” whose consequences, as we saw in Chapter 13, can be very serious. On the other hand, choosing 
more lags than necessary will lead to the “inclusion of irrelevant variable bias,” whose consequences are less 
serious; the coefficients can be consistently estimated by OLS, although their variances may be less efficient. 

One can use the Akaike or Schwarz information criterion discussed in Chapter 13 to choose the appro- 
priate lag length. These criteria can also be used to discuss the appropriate degree of the polynomial in 
addition to the discussion in point 2. 

2. Having specified k, we must also specify the degree of the polynomial m. Generally, the degree of the 
polynomial should be at least one more than the number of turning points in the curve relating £; to i. Thus, 
in Figure 17.7a there is only one turning point; hence a second-degree polynomial will be a good approxi- 
mation. In Figure 17.7c there are two turning points; hence a third-degree polynomial will provide a good 
approximation. A priori, however, one may not know the number of turning points, and therefore, the choice 
of m is largely subjective. However, theory may suggest a particular shape in some cases. In practice, one 
hopes that a fairly low-degree polynomial (say, m = 2 or 3) will give good results. Having chosen a particular 
value of m, if we want to find out whether a higher-degree polynomial will give a better fit, we can proceed 
as follows. 

Suppose we must decide between the second- and third-degree polynomials. For the second-degree 
polynomial the estimating equation is as given by Eq. (17.13.7). For the third-degree polynomial the corre- 
sponding equation is 

Y, =a + ao Zor + a1 Zip + A2Z2, + a3 Z3: + Uy (17.13.9) 


where Z3 = ae, i>X,_;. After running regression (17.13.9), if we find that a, is statistically significant 
but a, is not, we may assume that the second-degree polynomial provides a reasonably good approximation. 

Alternatively, as Davidson and MacKinnon suggest, “After q [the lag length] is determined, one can then 
apie to determine d [the degree of the polynomial] once again starting with a large value and then reducing 
it; 

However, we must beware of the problem of multicollinearity, which is likely to arise because of the way 
the Z’s are constructed from the X’s, as shown in Eq. (17.13.6) (see also Eq. [17.13.10]). As shown in Chapter 
10, in cases of serious multicollinearity, a; may turn out to be statistically insignificant, not because the true 
a; is zero, but simply because the sample at hand does not allow us to assess the separate impact of Z}, on 
Y. Therefore, in our illustration, before we accept the conclusion that the third-degree polynomial is not the 
correct choice, we must make sure that the multicollinearity problem is not serious enough, which can be 
done by applying the techniques discussed in Chapter 10. y 

3. Once m and k are specified, the Z’s can be readily constructed. For instance, if m = 2 and k = 5, the Z’s 
are 


5 
Zoi = > Xi = (X, I F X2 a Xı-3 SEPA ae F ES) 
i=0 
5 
Zia S i Xi = PE as 2X 1-2 ate 3X;3 ste 4X14 at 5X:—5) (17.13.10) 
ip 


5 
Zo, =) a = (Xt 4X2 + 9Xj-3 + 16Xp 4 + 25K) 
i=0 


ibid., pp. 675-676. 
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Notice that the Z’s are linear combinations of the original X’s. Also notice why the Z’s are likely to exhibit 
multicollinearity. 

Before proceeding to a numerical example, note the advantages of the Almon method. First, it provides 
a flexible method of incorporating a variety of lag structures (see Exercise 17.17). The Koyck technique, on 
the other hand, is quite rigid in that it assumes that the B’s decline geometrically. Second, unlike the Koyck 
technique, in the Almon method we do not have to worry about the presence of the lagged dependent variable 
as an explanatory variable in the model and the problems it creates for estimation. Finally, if a sufficiently 
low-degree polynomial can be fitted, the number of coefficients to be estimated (the a’s) is considerably 
smaller than the original number of coefficients (the B’s). 

But let us re-emphasize the problems with the Almon technique. First, the degree of the polynomial as 
well as the maximum value of the lag is largely a subjective decision. Second, for reasons noted previously, 
the Z variables are likely to exhibit multicollinearity. Therefore, in models like Eq. (17.13.9) the estimated 
a’s are likely to show large standard errors (relative to the values of these coefficients), thereby rendering 
one or more such coefficients statistically insignificant on the basis of the conventional t test. But this does 
not necessarily mean that one or more of the original 6 coefficients will also be statistically insignificant. 
(The proof of this statement is slightly involved but is suggested in Exercise 17.18.) As a result, the multicol- 
linearity problem may not be as serious as one might think. Besides, as we know, in cases of multicollinearity 
even if we cannot estimate an individual coefficient precisely, a linear combination of such coefficients (the 
estimable function) can be estimated more precisely. 


Example 17.11 Illustration of the Almon Distributed-Lag Model 


To illustrate the Almon technique, Table 17.8 gives data on inventories Y and sales X for the United States for 
the period 1954-1999. 

For illustrative purposes, assume that inventories depend on sales in the current year and in the preceding 
3 years as follows: ; 
Yı =a + bo Xt + BiXe_1 + B2Xt-2 + P3 Xt-3 + Ut (17.13.11) 
Furthermore, assume that 8; can be approximated by a second-degree polynomial as shown in Eq. (17.13.2). 
Then, following Eq. (17.13.7), we may write 

Yı =a + Ao Zot + A) Zit + 2Z2¢ + Ut (17.13.12) 

where 


3 
Zot = Do Xea = (Xe + Kea + X2 + Xea) 
i=0 
3 
Zig ae = (Ape ee At 2 3A) (17.13.13) 
IZQ 


3 
Zu = DX = (X1 +4Xt2+ 9Xt_3) 
i=0 


— 
The Z variables thus constructed are shown in Table 17.8. Using the data on Y and the Z’s, we obtain the 
following regression: 


Yı =25,845.06 + .1.1149Zo¢ — 0.3713Z;¢ — 0.0600Z2¢ 
se= (6596.998) (0.5381) (1.3743) (0.4549) (17.13.14) 
= (3.9177) (2.0718) (—0.2702) (—0.1319) 


R? = 0.9755 d=0.1643 F=517.7656 
Note: Since we are using a 3-year lag, the total number of observations has been reduced from 46 to 43. 
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Table 17.8 Inventories Y and Sales X, U.S. Manufacturing, and Constructed Z’s 

Observation Inventory Sales Zo Zi Z2 
1954 41,612 23,555 NA NA NA 
1955 45,069 26,480 NA NA NA 
1956. 50,642 27,740 NA NA NA 
1957 51,871 28,736 106,311 150,765 343,855 
1958 50,203 27,248 110,204 163,656 378,016 
1959 S2903 30,286 114,010 167,940 391,852 
1960 53,786 30,878 117,148 170,990 397,902 
1961 54,871 30,922 119,334 173,194 397,254 
1962 58,172 “33,358 125,444 183,536 427,008 
1963 60,029 35,058 130,216 187,836 434,948 
1964 63,410 37,331 136,669 194,540 446,788 
1965 68,207 40,995 146,742 207,521 477,785 
1966 77,986 44,870 158,254 220,831 505,841 
1967 84,646 46,486 169,682 238,853 544,829 
1968 90,560 50,229 182,580 259,211 594,921 
1969 98,145 53,501 195,086 277,811 640,003 
1970 101,599 52,805 203,021 293,417 672,791 
1971 102,567 55,906 212,441 310,494 718,870 
1972 108,121 63,027 225,239 322,019 748,635 
1973 124,499 72,931 244,669 333,254 761,896 
1974 157,625 84,790 276,654 366,703 828,193 
1975 159,708 86,589 307,337 419,733 943,757 
1976 174,636 98,797 343,107 474,962 1,082,128 
1977 188,378 113,201 383,377 526,345 1,208,263 
1978 211,691 126,905 425,492 570,562 1,287,690 
1979 242,157 143,936 482,839 649,698 1,468,882 
1980 265,215 154,391 538,433 737,349 1,670,365 
1981 283,413 168,129 593,361 822,978 1,872,280 
1982 311,852 163,351 629,807 908,719 2,081,117 
1983 312,379 172,547 658,418 962,782 2,225,386 
1984 339,516 190,682 694,709 1,003,636 2,339,112 
1985 334,749 194,538 721,118 1,025,829 2,351,029 
1986 322,654 194,657 752,424 1,093,543 2,510,189 
1987 338,109 206,326 786,203 1,155,779 2,688,947 
1988 369,374 224,619 820,140 1,179,254 2,735,796 
1989 391,212 236,698 862,300 1,221,242 2,801,836 
1990 405,073 242,686 910,329 1,304,914 2,992,108 
1991 390,905 239,847 943,850 1,389,939 3,211,049 
1992 382,510 250,394 969,625 1,435,313 3,340,873 
1993 384,039 260,635 993,562 .1,458,146 3,393,956 
1994 404,877 279,002 1,029,878 ~ 1,480,964 3,420,834 
1995 430,985 299,555 1,089,586 1,551,454 3,575,088 
1996 436,729 309,622 1,148,814 1,639,464 3,761,278 
1997 456,133 327,452 1,215,631 1,745,738 4,018,860 
1998 466,798 337,687 1,274,316 1,845,361 4,261,935 
1999 470,377 354,961 4,434,093 


Note: Y and X are in millions of dollars, seasonally adjusted. 


Source: Economic Report of the President, 2001, Table B-57, p. 340, The Z ’s are as shown in Eq. (17.13.13). 


1,329,722 


1,921,457 
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A brief comment on the preceding results is in order. Of the three Z variables, only Zp is individually statisti- 
cally significant at the 5 percent level, but the others are not, yet the F value is so high that we can reject the 
null hypothesis that collectively the Z’s have no effect on Y. As you may suspect, this might very well be due 
to multicollinearity. Also, note that the computed d value is very low. This does not necessarily mean that the 
residuals suffer from autocorrelation. More likely, the low d value suggests that the model we have used is 
probably mis-specified. We will comment on this shortly. 

From the estimated a's given in Eq. (17.13.3), we can easily estimate the original 8's easily, as shown in 
Eq. (17.13.8). In the present example, the results are as follows: 

Bo = âo = 1.1149 
By = (âo + G + G2) = 0.6836 
Bo = (Go + 2d) + 442) = 0.1321 (17.13.15) 
Êz = (Go + 3) + 942) = —0.5394 
Thus, the estimated distributed-lag model corresponding to Eq. (17.13.11) is: 
¥;=25,845.0 + 1.1150Xp + 0.6836X;_; + 0.1321 Xt-2 — 0.5394X;_3 
se= (6596.99) (0.5381) (0.4672) (0.4656) (0.5656) (17.13.16) 
es (3.9177) (2.0718) (1.4630) (0.2837) (—0.9537) 


Geometrically, the estimated £; is as shown in Figure 17.8. 
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Figure 17.8 Lag structure of the illustrative example. 


Our illustrative example may be used to point out a few additional features of the Almon lag procedure: 


1. The standard errors of the a coefficients are directly obtainable from the OLS regression (17.13.14), 
but the standard errors of some of the B coefficients, the objective of primary interest, cannot be so 
obtained. But they can be obtained from the standard errors of the estimated a coefficients by using a 
well-known formula from statistics, which is given in Exercise 17.18. Of course, there is no need to 
do this manually, for most statistical packages can do this routinely. The standard errors given in Eq. 
(17.13.15) were obtained from EViews 6. 
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2. The ĝ’s obtained in Eq. (17.13.16) are called unrestricted estimates in the sense that no a priori restric- 
tions are placed on them. In some situations, however, one may want to impose the so-called endpoint 
restrictions on the B’s by assuming that 8, and 8, (the current and kth lagged coefficient) are zero. 
Because of psychological, institutional, or technical reasons, the value of the explanatory variable in 
the current period may not have any impact on the current value of the regressand, thereby justifying 
the zero value for By. By the same token, beyond a certain time the kth lagged coefficient may not have 
any impact on the regressand, thus supporting the assumption that B, is zero. In our inventory example 
(Example 17.11), the coefficient of X, had a negative sign, which may not make economic sense. 
Hence, one may want to constrain that coefficient to zero.” Of course, you do not have to constrain 
both ends; you could put restriction only on the first coefficient, called near-end restriction, or on the 
last coefficient, called far-end restriction. For our inventory example, this is illustrated in Exercise 
17.28. Sometimes the f’s are estimated with the restriction that their sum is 1. But one should not put 
such restrictions mindlessly because such restrictions also affect the values of the other (unconstrained) 
lagged coefficients. 

3. Since the choice of the number of lagged coefficients as well as the degree of the polynomial is at the 
discretion of the modeler, some trial and error is inevitable, the charge of data mining notwithstanding. 
Here is where the Akaike and Schwarz information criteria discussed in Chapter 13 may come in 
handy. 

4. Since we estimated Eq. (17.13.16) using three lags and the second-degree polynomial, it is a restricted 
least-squares model. Suppose we decide to use three lags but do not use the Almon polynomial 
approach. That is, we estimate Eq. (17.13.11) by OLS. What then? Let us first see the results: 


f, = 26,008.60 + 0.9771X, + 1.0139X,; — 0.2022 X;2 — 0.3935X,_3 


se= (6691.12) (0.6820) (1.0920) (1.1021) (0.7186) 
t= (3.8870) (1.4327) (0.9284) (—0.1835) (—0.5476) 
R? = 0.9755 d= 0.1571 F = 379.51 l (17133173 


If you compare these results with those given in Eq. (17.13.16), you will see that the overall R“ is practically 
the same, although the lagged pattern in (17.13.17) shows more of a humped shape than that exhibited by Eq. 
(17.13.16). It is left to the reader to verify the R? value from (17.13.16). 

As this example illustrates, one has to be careful in using the Almon distributed lag technique. as the results 
might be sensitive to the choice of the degree of the polynomial and/or the number of lagged coefficients. 


17.14 Causality in Economics: The Granger Causality Test** 


Back in Section 1.4 we noted that, although regression analysis deals with the dependence of one variable 
on other variables, it does not necessarily imply causation. In other words, the existence of a relationship 
between variables does not prove causality or the direction of influence. But in regressions involving time 
series data, the situation may be somewhat different because, as one author puts it, 


>For a concrete application, see D. B. Batten and Daniel Thornton, “Polynomial Distributed Lags and the Estimation of the 
St. Louis Equation,” Review, Federal Reserve Bank of St. Louis, April 1983, pp. 13-25. 


There is another test of causality that is sometimes used, the so-called Sims test of causality. We discuss it by way of 
an exercise. 
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... time does not run backward. That is, if event A happens before event B, then it is possible that A is causing B. 
However, it is not possible that B is causing A. In other words, events in the past can cause events to happen today. 
Future events cannot.” [Emphasis added. ] 


This is roughly the idea behind the so-called Granger causality test.°° But it should be noted clearly that 
the question of causality is deeply philosophical with all kinds of controversies. At one extreme are people 
who believe that “everything causes everything,” and at the other extreme are people who deny the existence 
of causation whatsoever.’ The econometrician Edward Leamer prefers the term precedence over causality. 
Francis Diebold prefers the term predictive causality. As he writes: 


... the statement “y; causes y;” is just shorthand for the more precise, but long-winded, statement, “y; contains 
useful information for predicting y; (in the linear least squares sense), over and above the past histories of the other 
variables in the system.” To save space, we simply say that y; causes ys 


The Granger Test 


To explain the Granger test. we will consider the often asked question in macroeconomics: Is it GDP that 
“causes” the money supply M (GDP — M)? Or is it the money supply M that causes GDP (M — GDP)? 
(where the arrow points to the direction of causality). The Granger causality test assumes that the information 
relevant to the prediction of the respective variables, GDP and M, is contained solely in the time series data 
on these variables. The test involves estimating the following pair of regressions: 


GDP, = È a; M:i + }_ BGDP)_; + u11 (17.14.1) 
i=l j=) 
M, = 9 AMi + Y 8;GDP;_; + ux (17.14.2) 


i=l j=l 
where it is assumed that the disturbances u, and u,, are uncorrelated. In passing, note that, since we have two 
variables, we are dealing with bilateral causality. In the chapters on time series econometrics, we will extend 
this to multivariable causality through the technique of vector autoregression (VAR). 
Equation (17.14.1) postulates that current GDP is related to past values of itself as well as that of M, and 
Eq. (17.14.2) postulates a similar behavior for M. Note that these regressions can be cast in growth forms, 
GDP and M, where a dot over a variable indicates its growth rate. We now distinguish four cases: . 


1. Unidirectional causality from M to GDP is indicated if the estimated coefficients on the lagged M in 
Eq. (17.14.1) are statistically different from zero as a group and the set of estimated coefficients on the 
lagged GDP in Eq. (17.14.2) is not statistically different from zero. 

2. Conversely, unidirectional causality from GDP to M exists if the set of lagged M coefficients in Eq. 
(17.14.1) is not statistically different from zero and the set of the lagged GDP coefficients in Eq. 
(17.14.2) is statistically different from zero. 


55Gary Koop, Analysis of Economic Data, john Wiley & Sons, New York, 2000, p. 175. 

56C, W. J. Granger, “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods,” Econometrica, 
July 1969, pp. 424-438. Although popularly known as the Granger causality test, it is appropriate to call it the Wiener- 
Granger causality test, for it was earlier suggested by Wiener. See N. Wiener, “The Theory of Prediction,” in E. F. Beck- 
enback, ed., Modern Mathematics for Engineers, McGraw-Hill, New York, 1956, pp. 165-190. 

57For an excellent discussion of this topic, see Arnold Zellner, “Causality and Econometrics,” Carnegie-Rochester Conference 
Series, 10, K. Brunner and A. H. Meltzer, eds., North Holland Publishing Company, Amsterdam, 1979, pp. 9-50. 


58Francis X. Diebold, Elements of Forecasting, South Western Publishing, 2d ed., 2001, p. 254. 
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Feedback, or bilateral causality, is suggested when the sets of M and GDP coefficients are statistically 
significantly different from zero in both regressions. 

Finally, independence is suggested when the sets of M and GDP coefficients are not statistically 
significant in either of the regressions. 


More generally, since the future cannot predict the past, if variable X (Granger) causes variable Y, then 
changes in X should precede changes in Y. Therefore, in a regression of Y on other variables (including its 
own past values) if we include past or lagged values of X and it significantly improves the prediction of Y, 
then we can say that X (Granger) causes Y. A similar definition applies if Y (Granger) causes X. 

The steps involved in implementing the Granger causality test are as follows. We illustrate these steps with 
the GDP-money example given in Eq. (17.14.1). 


l. 


6. 


Regress current GDP on all lagged GDP terms and other variables, if any, but do not include the lagged 
M variables in this regression. As per Chapter 8, this is the restricted regression. From this regression 
obtain the restricted residual sum of squares, RSSp. 


. Now run the regression including the lagged M terms. In the language of Chapter 8, this is the 


unrestricted regression. From this regression obtain the unrestricted residual sum of squares, RSSp. 
The null hypothesis is Hp: a; = 0, i = 1, 2, ..., n, that is, lagged M terms do not belong in the regression. 
To test this hypothesis, we apply the F test given by Eq. (8.7.9), namely, 


_ (RSSz — RSSur)/m 
~ -RSSyr/(n — k) 


which follows the F distribution with m and (n — k) df. In the present case m is equal to the number of 
lagged M terms and k is the number of parameters estimated in the unrestricted regression. 

If the computed F value exceeds the critical F value at the chosen level of significance, we reject the 
null hypothesis, in which case the lagged M terms belong in the regression. This is another way of 
saying that M causes GDP. 

Steps 1 to 5 can be repeated to test the model (17.14.2), that is, whether GDP causes M. 


(8.7.9) 


Before we illustrate the Granger causality test, there are several things that need to be noted: 


Îl 


It is assumed that the two variables, GDP and M, are stationary. We have already discussed the concept 
of stationarity in intuitive terms before and will discuss it more formally in Chapter 21. Sometimes 
taking the first differences of the variables makes them stationary, if they are not already stationary in 
the level form. a 

The number of lagged terms to be introduced in the causality tests is an important practical question. 
As in the case of the distributed-lag models, we may have to use the Akaike or Schwarz information 
criterion to make the choice. But it should be added that the direction of causality may depend critically 
on the number of lagged terms included. 

We have assumed that the error terms entering the causality test are uncorrelated. If this is not the case, 
appropriate transformation, as discussed in Chapter 12, may have to be taken.*” 

Since our interest is in testing for causality, one need not present the estimated coefficients of models 


(17.14.1) and (17.14.2) explicitly (to save space); just the results of the F test given in Eq. (8.7.9) will 
suffice. 


. One has to guard against “spurious” causality. In our GDP-money example, suppose we consider interest 


rate, say the short-term interest rate. It is quite possible that money “Granger-causes” the interest rate 


59For further details, see Wojciech W. Charemza and Derek F. Deadman, New Directions in Econometric Practice: General to 
Specific Modelling, Cointegration and Vector Autoregression, 3d ed., Edward Elgar Publishing, 1997, Chapter 6. 
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and the interest rate in turn ““Granger-causes” GDP. Therefore, if we do not account for the interest rate, 
and find that it is money that causes GDP, then, the observed causality between GDP and money may be 
spurious.” As noted previously, one way of dealing with this is to consider a multiple-equation system, 
such as vector autoregression (VAR), which we will discuss in some length in Chapter 22. 


Example 17.12 Causality between Money and Income 


R. W. Hafer used the Granger test to find out the nature of causality between GNP (rather than GDP) and M 
for the United States for the period 1960-1 to 1980-IV. Instead of using the gross values of these variables, he 
used their growth rates, GNP and M, and used four lags of each variable in the two regressions given previ- 
ously. The results were as follows:°' The null hypothesis in each case is that the variable under consideration 
does not “Granger-cause” the other variable. 


Direction of Causality F Value Decision 


M> GNP 2.68 Reject 
GNP > M 0.56 Do not reject 


These results suggest that the direction of causality is from money growth to GNP growth since the 
estimated F is significant at the 5 percent level; the critical F value is 2.50 (for 4 and 71 df). On the other 
hand, there is no “reverse causation” from GNP growth to money growth, since the F value is statistically 
insignificant. 


Example 17.13 Causality between Money and Interest Rate in Canada 


Refer to the Canadian data given in Table 17.5. Suppose we want to find out if there is any causality between 
money supply and interest rate in Canada for the quarterly periods of 1979-1988. To show that the Granger 
causality test depends critically on the number of lagged terms introduced in the model, we present below 
the results of the F test using several (quarterly) lags. In each case, the null hypothesis is that interest rate does 
not (Granger-) cause money supply and vice versa. 


Direction of Causality Number of Lags F Value Decision 

R+>M 2 1292 Reject 

M>R 2 3.22 Reject 

R> M 4 S59 Reject 

M—>R 4 2.45 Reject (at 7%) 
R> M 6 3.5163 Reject 

M>R 6 2.71 Reject 

R> M 8 1.40 Do not reject 
M>R 8 1.62 Do not reject 


Note these features of the preceding results of the F test: Up to six lags, there is bilateral causality between 
money supply and interest rate. However, at eight lags, there is no statistically discernible relationship between 
the two variables. This reinforces the point made earlier that the outcome of the Granger test is sensitive to 
the number of lags introduced in the model. 
a 
60On this, see J. H. Stock and M. W. Watson, “Interpreting the Evidence on Money-Income Causality,” Journal of Economet- 
rics, vol. 40, 1989, pp. 783-820. 
61R. W. Hafer, “The Role of Fiscal Policy in the St. Louis Equation,” Review, Federal Reserve Bank of St. Louis, January 1982, 
pp. 17-22. See his footnote 12 for the details of the procedure. 
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Example 17.14 Causality between GDP Growth Rate and Gross Savings Rate in Nine 
East Asian Countries 


A study of the bilateral causality between GDP growth rate (g) and gross savings rate (s) showed the results 
given in Table 17.9.62 For comparison, the results for the United States are also presented in the table. By and 
large, the results presented in Table 17.9 show that for most East Asian countries the causality runs from the 
GDP growth rate to the gross savings rate. By contrast, for the United States for the period 1950-1988 up to 
lag 3, causality ran in both directions, but for lags 4 and 5, the causality ran from the GDP growth rate to the 
savings rate but not the other way round. ; 


Table 17.9 Tests of Bivariate Granger Causality between the Real Per Capita GDP 
Growth Rate and the Gross Savings Rate 


Lagged Right-hand Lagged Right-hand 
Economy, Years Side Variable Economy, Years Side Variable 
Years of Lags Savings Growth Years of Lags Savings Growth 
Hong Kong, 1 Sig Sig Philippines, 1 NS Sig 
1960-88 2 Sig Sig 1950-88 2 NS Sig 
3 Sig Sig 3 NS Sig 
4 Sig Sig A NS Sig 
5 Sig Sig 5 NS Sig 
Indonesia, 1 Sig Sig Singapore, 1 NS NS 
1965 2 NS Sig 1960-88 2 NS NS 
3 NS Sig 3 NS NS 
4 NS . Sig -4 Sig NS 
5 NS Sig 5 Sig NS 
Japan, 1 NS Sig Taiwan, China, 1 Sig Sig 
1950-88 2 NS Sig 1950-88 2 NS Sig 
3 NS Sig 3 NS Sig 
4 NS Sig 4 NS Sig 
Sa NS Sig 5 NS Sig 
Korea, Rep. of, 1 Sig Sig Thailand, 1 NS Sig 
1955-88 2 NS Sig 1950-88 2 NS Sig 
3 NS Sig 3 NS Sig 
4 NS Sig 4 NS Sig 
5 NS Sig 5 NS Sig 
Malaysia, 1 Sig Sig United States, 1 Sig Sig 
1955-88 2 Sig Sig 1950-88 2 Sig Sig 
3 NS NS 3 Sig Sig CL 
4 NS NS 4 NS Sig 
5 NS Sig 5 NS Sig 


Sig: Significant; NS: Not significant. 
Note: Growth is real per capita GDP growth at 1985 international prices. 


Source: World Bank, The East Asian Miracle: Economic Growth and Public Policy, Oxford University Press, New York. 1993, p. 244, (Table AS-2). 
The original source is Robert Summers and Alan Heston, “The Penn World Tables (Mark 5): An Expanded Set of International Comparisons, 1950-88,” 
Quarterly Journal of Economics, vol. 105, no. 2, 1991. 


To conclude our discussion of Granger causality, keep in mind that the question we are examining is 
whether statistically one can detect the direction of causality when temporally there is a lead-lag relationship 
between two variables. If causality is established, it suggests that one can use a variable to better predict the 
other variable than simply the past history of that other variable. In the case of the East Asian economies, it 
seems that we can better predict the gross savings rate by considering the lagged values of the GDP growth 
rate than merely the lagged values of the gross savings rate. 


62These results are obtained from The East Asian Miracle: Economic Growth and Public Policy, published for the World Bank 
by Oxford University Press, 1993, p. 244. 
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*A Note on Causality and Exogeneity 


As we will study in the chapters on simultaneous-equation models in Part 4 of this text, economic variables 
are often classified into two broad categories, endogenous and exogenous. Loosely speaking, endogenous 
variables are the equivalent of the dependent variable in the single-equation regression model and exogenous 
variables are the equivalent of the X variables, or regressors, in such a model, provided the X variables are 
uncorrelated with the error term in that equation.” 

Now we raise an interesting question: Suppose in a Granger causality test we find that an X variable 
(Granger-) causes a Y variable without being caused by the latter (i.e., no bilateral causality). Can we then 
treat the X variable as exogenous? In other words, can we use Granger causality (or noncausality) to establish 
exogeneity? 

To answer this question, we need to distinguish three types of exogeneity: (1) weak, (2) strong, and 
(3) super. To keep the exposition simple, suppose we consider only two variables, Y, and X,, and further 
suppose we regress Y, on X,. We say that X, is weakly exogenous if Y, also does not explain X,. In this case 
estimation and testing of the regression model can be done, conditional on the values of X,. As a matter of 
fact, going back to Chapter 2, you will realize that our regression modeling was conditional on the values of 
the X variables. X, is said to be strongly exogenous if current and lagged Y values do not explain it (i.e., no 
feedback relationship). And X, is super-exogenous if the parameters in the regression of Y on X do not change 
even if the X values change; that is, the parameter values are invariant to changes in the value(s) of X. If that 
is in fact the case, then, the famous “Lucas critique” may lose its force. 

The reason for distinguishing the three types of exogeneity is that, “In general, weak exogeneity is all that 
is needed for estimating and testing, strong exogeneity is necessary for forecasting and super exogeneity for 
policy analysis.”® 

Returning to Granger causality, if a variable, say Y, does not cause another variable, say X, can we then 
assume that the latter is exogenous? Unfortunately, the answer is not straightforward. If we are talking about 
weak exogeneity, it can be shown that Granger causality is neither necessary nor sufficient to establish 
exogeneity. On the other hand, Granger causality is necessary (but not sufficient) for strong exogeneity. The 
proofs of these statements are beyond the scope of this book. For our purpose, then, it is better to keep the 
concepts of Granger causality and exogeneity separate and treat the former as a useful descriptive tool for 
time series data. In Chapter 19 we will discuss a test that can be used to find out if a variable can be treated 


as exogenous. 


*Optional. 

630f course, if the explanatory variables include one or more lagged terms of the endogenous variable, this requirement 
may not be fulfilled. 

64The Nobel laureate Robert Lucas put forth the proposition that existing relations between economic variables may change 
when policy changes, in which case the estimated parameters from a regression model will be of little value for prediction. 
On this, see Oliver Blanchard, Macroeconomics, Prentice Hall, 1997, pp. 371 -372. 

65Keith Cuthbertson, Stephen G. Hall, and Mark P. Taylor, Applied Econometric Techniques, University of Michigan Press, 
1992, p. 100. 

66For a comparatively simple discussion, see G. S. Maddala, Introduction to Econometrics, 2d ed., Macmillan, New York, 
1992, pp. 394-395, and also David F. Hendry, Dynamic Econometrics, Oxford University Press, New York, 1995, Chapter 5. 
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For psychological, technological, and institutional reasons, a regressand may respond to a regressor(s) 
with a time lag. Regression models that take into account time lags are known as dynamic or lagged 
regression models. 

There are two types of lagged models: distributed-lag and autoregressive. In the former, the current 
and lagged values of regressors are explanatory variables. In the latter, the lagged value(s) of the 
regressand appears as an explanatory variable(s). 

A purely distributed-lag model can be estimated by OLS, but in that case there is the problem of multi- 
collinearity since successive lagged values of a regressor tend to be correlated. 

As a result, some shortcut methods have been devised. These include the Koyck, the adaptive expecta- 
tions, and partial adjustment mechanisms, the first being a purely algebraic approach and the other two 
being based on economic principles. 

A unique feature of the Koyck, adaptive expectations, and partial adjustment models is that they 
all are autoregressive in nature in that the lagged value(s) of the regressand appears as one of the 
explanatory variables. 

Autoregressiveness poses estimation challenges; if the lagged regressand is correlated with the error 
term, OLS estimators of such models are not only biased but also are inconsistent. Bias and inconsis- 
tency are the case with the Koyck and the adaptive expectations models; the partial adjustment model is 
different in that it can be consistently estimated by OLS despite the presence of the lagged regressand. 
To estimate the Koyck and adaptive expectations models consistently, the most popular method is 
the method of instrumental variable. The instrumental variable is a proxy variable for the lagged 
regressand but with the property that it is uncorrelated with the error term. 

An alternative to the lagged regression models just discussed is the Almon polynomial distributed- 
lag model, which avoids the estimation problems associated with the autoregressive models. The major 
problem with the Almon approach, however, is that one must prespecify both the lag length and the 
degree of the polynomial. There are both formal and informal methods of resolving the choice of the 
lag length and the degree of the polynomial. 

Despite the estimation problems, which can be surmounted, the distributed and autoregressive models 
have proved extremely useful in empirical economics because they make the otherwise static economic 
theory a dynamic one by taking into account explicitly the role of time. Such models help us to distin- 
guish between the short- and the long-run responses of the dependent variable to a unit change in the 
value of the explanatory variable(s). Thus, for estimating short- and long-run price, income, substi- 
tution, and other elasticities these models have proved to be highly useful." 

Because of the lags involved, distributed and/or autoregressive models raise the topic of causality in 
economic variables. In applied work, Granger causality modeling has received considerable attention. 
But one has to exercise great caution in using the Granger methodology because it is very sensitive to 
the lag length used in the model. 

Even if a variable (X) “Granger-causes” another variable (Y), it does not mean that X is exogenous. We 
distinguished three types of exogeneity—weak, strong, and super— and pointed out the importance of 
the distinction. 


67For applications of these models, see Arnold C. Harberger, ed., The Demand for Durable Goods, University of Chicago Press, 
Chicago, 1960. 
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Multiple Choice Questions 


. A regression model that includes both the current and past values of explanatory variables is called 
a. Autoregressive model 
b. Distributed-lag model 
c. Fixed effects model 
d. Linear probability time series model 
. A regression model that includes the lagged values of the dependent variable among its explanatory 
variables is called 
a. Autoregressive model 
b. Distributed-lag model 
c. Fixed effects model 
d. Linear probability time series model 
. Given regression model Y, = æ + BoX, + ByX,_ 1 + BoX,_2 +... + B,X,_, + u, The short-run multiplier 
is given by 
a. Bo 
b. Bo+ By 
c. Bo+ Bi + By 
d. PBo+ Bı +B, +... Bk . 
. Given the model as in Question (3) above, the interim multiplier for second period is given by 
a. Bo 
b. Bo+ Bı 
c. Bo+ Bı +B, 
d. Bo+ Bı +B, +... By 
. Given the model as in Question (3) above, the long-run distributed-lag multiplier is given by 
a. Bo 
b. Bo+ Bı 
c. By +B, +R: 
d. Bot By +B, +... Bı 
_ Given the model as in Question (3) above, the total distributed-lag multiplier is given by 
a. Bo 
b. Bo + B, 
c. Po+ B, + By 
d. By +B, +B. +... Br 
_ Profits of a firm depend on the current sales and past period (t — 1) sales of the firm. This is an example 
of 
a. Distributed lag model 
b. Autoregressive model 
c. Linear Programming Model 
d. Lagged model 
. This year’s agricultural output not only depends on the land quality, rainfall and crop type but also on 
the income gained by the farmer in the last year. A good proxy of income gained is agricultural output 
as prices for a crop is same for all farmers. This is an example of 
a. Distributed lag model 
b. Autoregressive model 
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c. Linear Programming Model 
d. Lagged model 
The regression model Y, = a + BoX, + B,X,_ 1 + BoX;_2 + --- +B X_~ + Uj is called 
a. Autoregressive model of order k 
b. Infinite distributed-lag model 
c. Finite distributed-lag model 
d. Infinite autoregressive model 
Using sequential procedure in estimating the regression model given in Question (9) above, by first 
regressing Y, on X, then Y, on X, and X,_ ; and so on is known as 
a. Ad hoc estimation method 
b. Koyck approach 
c. The method of instrumental variables 
d. The Almon approach 
In Koyck method, B, = BÀ% k=0,1,...and0<A<1, (1—A) is known as 
a. The rate of decline of the distributed lags 
b. The rate of decay of the distributed lags 
c. The speed of adjustment of the distributed lags 
d. The estimate of the distributed lags 
In Koyck model, the closer the A (lambda) is to 1, the rate of decline in B, 
a. is faster 
b. is slower 
c. Depends on B, 
d. Depends on k 
In koyck transformation, 
a. Distributed-lag model is converted into an autoregressive model 
b.. An autoregressive model is converted into distributed-lag model 
c. Infinite distributed-lag model is converted to finite distributed-lag model 
d. Finite distributed-lag model is converted to infinite distributed-lag model 
Koyck transformation model underlies 
a. Adaptive expectation model 
b. Stock adjustment model ad 
c. Rational expectation hypothesis model 
d. Both a and b above 
Which of the following critical classical assumptions is not satisfied by the Koyck transformation? 
a. The regression model is correctly specified. 
b. There is no perfect linear relationship between the explanatory variables. 
c. The number of observations must be greater than the number of explanatory variables. 
d. The values of the explanatory variables are non-stochastic. 
In a regression model, the explanatory variables are found to be correlated with the error term. OLS 
estimation of this regression model would result in the parameters being 
a. Biased 
b. Inconsistent 
c. Biased and inconsistent 
d. Unbiased but inconsistent 
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The best method to estimate Koyck model is 
a. OLS technique 
b. Instrumental variable method 
c. GLS method 
d. MLE method 
The large-sample test of first-order serial correlation in autoregressive model proposed by Durbin is 
a. Durbin—Watson h test 
b. Durbin—Watson d test 
c. Durbin t-test 
d. Durbin F-test 
The Almon technique of estimating distributed-lag model is better than Koyck model, because in Koyck 
model 
a. The lagged explanatory variable form part of the set of explanatory variables creating estimation 
problems 
b. It is assumed that the beta parameters decline geometrically 
c. The number of lags is decided subjectively 
d. Explanatory variables exhibit multicollinearity 
The test statistic used to study the direction of causality between two time series variables is 
a. Bilateral causality test 
b. Granger causality test 
c. Almon’s test 
d. Distributed lag model test 
In testing the causality between two variables, it is assumed that 
a. At least one variable is stationary 
b. Both variables are stationary 
c. Both variables are non-stochastic 
d. Both variables have constant variance 
Which of the following test statistics is used to test the causality relation between two variables? 
a. Student t-test 
b. Chi-square test 
c. F-test 
d. Lagrange multiplier test 
Regressing Y, on X, if we find that current and lagged Y values do not explain X, then X, is said to be 
a. Weakly exogenous 
b. Strongly exogenous 
c. Super-exogenous 
d. Endogenous 
Which of the following is a necessary condition with regard to X, if we use the regression model for 
forecasting? 
a. X,should be weakly exogenous. 
b. X, should be strongly exogenous. 
c. X, should be super-exogenous. 
d. X,should be endogenous. m 
If a variable (X) ‘Granger-causes’ another variable (Y), then 
a. We can say that X is exogenous 
b. We cannot say that X is exogenous 
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c. We can say that X is endogenous 
d. We cannot say that X is endogenous 


Exercises 


Questions 


17g 


MZ. 
3: 
17.4. 


1S 


Explain with a brief reason whether the following statements are true, false, or uncertain: 

a. All econometric models are essentially dynamic. 

b. The Koyck model will not make much sense if some of the distributed-lag coefficients are positive 
and some are negative. 

c. If the Koyck and adaptive expectations models are estimated by OLS, the estimators will be biased 
but consistent. 

d. In the partial adjustment model, OLS estimators are biased in finite samples. 

e. In the presence of a stochastic regressor(s) and an autocorrelated error term, the method of instru- 
mental variables will produce unbiased as well as consistent estimates. 

J. In the presence of a lagged regressand as a regressor, the Durbin—Watson d statistic to detect 
autocorrelation is practically useless. 

g- The Durbin h test is valid in both large and small samples. 

h. The Granger test is a test of precedence rather than a test of causality. 

Establish Eq. (17.7.2). 

Prove Eq. (17.8.3). 

Assume that prices are formed according to the following adaptive expectations hypothesis: 


Py = VF, ey) 


where P* is the expected price and P the actual price. 
Complete the following table, assuming y = 0.5:" 


Period Ps P 
t—3 100 110 2 
t—2 125 
t—1 155 
t 185 


aT = 


Consider the model 
Y, = a+ pi Xir + PoXy + B31 + vy 


Suppose Y, , and v, are correlated. To remove the correlation, suppose we use the following instru- 
mental variable approach: First regress Y, on X,, and X, and obtain the estimated Y, from this 
regression. Then regress 


Y, =a + BX: + PoXy + Bf, +v; 


“Adapted from C. K. Shaw, op. cit., p. 26. 


NT, 


TIE 


17.8. 


17.9. 
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where P are estimated from the first-stage regression. 

. How does this procedure remove the correlation between Y 1 and v, in the original model? 

. What are the advantages of the recommended procedure over the Liviatan approach? 

. Establish (17.4.8). 

. Evaluate the median lag for A = 0.2, 0.4, 0.6, 0.8. 

. Is there any systematic relationship between the value of A and the value of the median lag? 
. Prove that for the Koyck model, the mean lag is as shown in Eq. (17.4.10). 

. If A is relatively large, what are its implications? 

Using the formula for the mean lag given in Eq. (17.4.9), verify the mean lag of 10.959 quarters 
reported in the illustration of Table 17.1. 

Suppose 


SS aera FTA 


M, =a + B,Y* + BoRi + uy; 
where M = demand for real cash balances, Y“ = expected real income, and R* = expected interest rate. 
Assume that expectations are formulated as follows: 
Yee eee) 0, 
RP = yR + (hee) Re, 
where y, and y, are coefficients of expectation, both lying between 0 and 1. 


a. How would you express M, in terms of the observable quantities? 
b. What estimation problems do you foresee? 


*17.10. If you estimate Eq. (17.7.2) by OLS, can you derive estimates of the original parameters? What 


problems do you foresee? (For details, see Roger N. Waud.)* 


17.11. Serial correlation model. Consider the following model: 
Y, =œ + BX; +u 
Assume that u, follows the Markov first-order autoregressive scheme given in Chapter 12, namely, 
Ur = pur- + Er 
where p is the coefficient of (first-order) autocorrelation and where e, satisfies all the assumptions of 
the classical OLS. Then, as shown in Chapter 12, the model 
Y, =a(1 — p) + B(X: — pX1-1) + pi + £: 
will have a serially independent error term, making OLS estimation possible. But this model, called 
the serial correlation model, very much resembles the Koyck, adaptive expectations, and partial 
adjustment models. How would you know in any given situation which of the preceding models is 
appropriate?** 
17.12. Consider the Koyck (or, for that matter, the adaptive expectations) model given in Eq. (17.4.7), 
namely, 
Y, = a(1 — A) + BoX;, +AYi-1 + (ur — àur) 
"Optional. 


t“Misspecification in the ‘Partial Adjustment’ and ‘Adaptive Expectations’ Models,” International Economic Review, vol. 9, 
no. 2, June 1968, pp. 204-217. 

“For a discussion of the serial correlation model, see Zvi Griliches, “Distributed Lags: A Survey,” Econometrica, vol. 35, no. 
1, january 1967, p. 34. 
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Suppose in the original model u, follows the first-order £, 
autoregressive scheme u, — puj, = &, where p is the 
coefficient of autocorrelation and where g, satisfies all the 
classical OLS assumptions. 
a. If p = À, can the Koyck model be estimated by OLS? 
b. Will the estimates thus obtained be unbiased? 
Consistent? Why or why not? 
c. How reasonable is it to assume that p = A? 
Triangular, or arithmetic, distributed-lag model.” This 
model assumes that the stimulus (explanatory variable) 
exerts its greatest impact in the current time period and 
then declines by equal decrements to zero as one goes into, Time 
the distant past. Geometrically, it is shown in Figure 17.9. Figure 17.9 ‘Trangillar or atithmetictlag 
Following this distribution, suppose we run the following -cheme ices, 
succession of regressions: 


2 Ge 
n =a +6 (22) 

3X, +2X,-) + X,- 
Y =a+6( t — =) 

4X, + 3X,- 2 D 
Y =a+6( t+ eat a 


etc., and choose the regression that gives the highest R? as the “best” regression. Comment on this 
strategy. 

From the quarterly data for the period 1950-1960, F. P. R. Brechling obtained the following demand 
function for labor for the British economy (the figures in parentheses are standard errors):* 


È, = 14.22 + 0.172Q, — 0.028t — 0.00072 — 0.297E,_, 
(2.61) (0.014) (0.015) (0.0002) (0.033) 
R? =0.76 d=137 J 


where È, = (E, -— E,) 


Q = output 
t= time 
The preceding equation was based on the assumption that the desired level of employment E* is a 
function of output, time, and time squared and on the hypothesis that E, — E,_; = 5(E* — E,_,), 


where ô, the coefficient of adjustment, lies between 0 and 1. 

a. Interpret the preceding regression. 

b. What is the value of 5? 

c. Derive the long-run demand function for labor from the estimated short-run demand function. 
d. How would you test for serial correlation in the preceding model? 


"This model was proposed by Irving Fisher in “Note on a Short-Cut Method for Calculating Distributed Lags,” International 
Statistical Bulletin, 1937, pp. 323-328. 


Tr PLR. Brechling, “The Relationship between Output and Employment in British Manufacturing Industries,” Review of 
Economic Studies, vol. 32, July 1965. 


\ 


WALS: 


17.16. 
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In studying the farm demand for tractors, Griliches used the following model: 


= B B 
1 = OX -1424-1 


where T* = desired stock of tractors 
X, = relative price of tractors 
X, = interest rate 
Using the stock adjustment model, he obtained the following results for the period 1921-1957: 


log T, = constant — 0.218 log X,-1 — 0.855 log X>,_1 + 0.864 log Tı 
(0.051) (0.170) (0.035) 
R? = 0.987 


where the figures in the parentheses are the estimated standard errors. 

a. What is the estimated coefficient of adjustment? 

b. What are the short- and long-run price elasticities? 

c. What are the corresponding interest elasticities? 

d. What are the reasons for high or low rate of adjustment in the present model? 
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Whenever the lagged dependent variable appears as an explanatory variable, the R is usually much 


higher than when it is not included. What are the reasons for this observation? 


B; 


0 A Lag 0 Lag 


0 Time 0 Time 


Figure 17.10 Hypothetical lag structures. 


tZvi Griliches, “The Demand for a Durable Input: Farm Tractors in the United States, 1921-1957,” in Arnold C. Harberger, 
ed., The Demand for Durable Goods, University of Chicago Press, Chicago, 1960. 
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17.17. Consider the lag patterns in Figure 17.10. What degree polynomials would you fit to the lag structures 


and why? 
17.18. Consider Eq. (17.13.4): 


Bi = ao + Qi + api? +--+ ami” 
To obtain the variance of Bi from the variances of a;,, we use the following formula: 
var (B;) = var (Gy + Gi + Gai? +---+Gni”) 
m 
© = >93 i var (âj) +2 Do iU +P) cov (â;jâp) 
i j=0 J<p 

a. Using the preceding formula, find the variance of Bi expressed as 

Êi = Gp + âii + i? 

Ê: = Gy + Gi + Gi? + G37? 
b. If the variances of â; are large relative to themselves, will the variance of Bi be large also? Why or 


why not? 
17.19. Consider the following distributed-lag model: 


Y, = a + oX: + BiXt-1 + BoX1-2 + BsX1-3 + BaXr—4 + uy 


B; 


Lag 
Figure 17.11 Inverted V distributed-lag model. 


Assume that 8; can be adequately expressed by the second-degree polynomial as follows: 


Bi = a + ayi + api? 


How would you estimate the B’s if we want to impose the restriction that By = B, = 0? 
17.20. The inverted V distributed-lag model. Consider the k-period finite distributed-lag model 


Y, = a + BoX; + Bi X11 + BoXr-2 +--+ BeXte + úr 
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F. DeLeeuw has proposed the structure for the B’s as in Figure 17.11, where the B’s follow the 
inverted V shape. Assuming for simplicity that k (the maximum length of the lag) is an even number, 
and further assuming that Bp and 8, are zero, DeLeeuw suggests the following scheme for the 8’s:* 


Bi = iB 
=(k—i)ß 


How would you use the DeLeeuw scheme to estimate the parameters of the preceding k-period 
distributed-lag model? 

17.21. Refer to Exercise 12.15. Since the d value shown there is of little use in detecting (first-order) autocor- 
relation (why?), how would you test for autocorrelation in this case? 


Empirical Exercises 


17.22. Consider the following model: 
Y“ = A + BoX; + u 


where Y” = desired, or long-run, business expenditure for new plant and equipment, X, = sales, and t 
= time. Using the stock adjustment model, estimate the parameters of the long- and short-run demand 
function for expenditure on new plant and equipment given in Table 17.10. 

How would you find out if there is serial correlation in the data? 


Table 17.10 Investment in Fixed Plant and Equipment in Manufacturing Y and Manufacturing Sales X, in Bil- 
lions of Dollars, Seasonally Adjusted, United States, 1970-1991 


Year Plant Expenditure, Y Sales, X2 Year Plant Expenditure, Y Sales, X2 
1970 36.99 52.805 1981 128.68 168.129 
1971 33.60 55.906 1982 123.97 163.351 
1972 35.42 63.027 1983 117.35 172.547 
1973 42.35 72.931 1984 139.61 190.682 
1974 52.48 84.790 1985 152.88 194.538 
1975 53.66 86.589 1986 137.95 194.657 
1976 58.53 98.797 1987 141.06 206.326 
1977 67.48 113.201 1988 163.45 223.541 
1978 78.13 126.905 1989 183.80 232.724 
1979 95.13 143.936 1990 192.61 239.459 
1980 112.60 154.391 1991 182.81 235.142 


Source: Economic Report of the President, 1993. Data on Y from Table B-52, p. 407; data on X, from Table 8-53, p. 408. 


17.23. Use the data of Exercise 17.22 but consider the following model: 
Yi" = BoX;'e" 


I 
Using the stock adjustment model (why?), estimate the short- and long-run elasticities of expenditure 
on new plant and equipment with respect to sales. Compare your results with those for Exercise 17.22. 


“See his article, “The Demand for Capital Goods by Manufacturers: A Study of Quarterly Time Series,” Econometrica, vol. 
30, no. 3, July 1962, pp. 407-423. 
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17.24. 


17.25. 


17.26. 


NAPA 


17.28. 


nz) 


17.30. 


Ng Pale 


Which model would you choose and why? Is there serial correlation in the data? How do you know? 
Use the data of Exercise 17.22 but assume that 


Y, =a + BX; +u; 


where X* are the desired sales. Estimate the parameters of this model and compare the results with 
those obtained in Exercise 17.22. How would you decide which is the appropriate model? On the 
basis of the h statistic, would you conclude there is serial correlation in the data? 
Suppose someone convinces you that the relationship between business expenditure for new plant and 
equipment and sales is as follows: 

Yi=a+pxX, + uy; 


where Y* is desired expenditure and X* is desired or expected sales. Use the data given in Exercise 

17.22 to estimate this model and comment on your results. 

Using the data given in Exercise 17.22, determine whether plant expenditure Granger-causes sales or 

whether sales Granger-cause plant expenditure. Use up to six lags and comment on your results. What 

important conclusion do you draw from this exercise? 

Assume that sales in Exercise 17.22 has a distributed-lag effect on expenditure on plant and equipment. 

Fit a suitable Almon lag model to the data. 

Reestimate Eq. (17.13.16) imposing (1) near-end restriction, (2) far-end restriction, and (3) both end 

restrictions and compare your results given in Eq. (17.13.16). What general conclusion do you draw? 

Table 17.11 gives data on private fixed investment in information processing and equipment (Y, in 

billions of dollars), sales in total manufacturing and trade (X,, in millions of dollars), and interest rate 

(X3, Moody’s Aaa corporate bond rate, percent); data on Y and X, are seasonally adjusted. 

a. Test for bilateral causality between Y and X,, paying careful attention to the lag length. 

b. Test for bilateral causality between Y and X}, again paying careful attention to the lag length. 

c. To allow for the distributed lag effect of sales on investment, suppose you decide to use the Almon 
lag technique. Show the estimated model, after paying due attention to the length of the lag as well 
as the degree of the polynomial. 

Table 17.12 gives data on indexes of real compensation per hour (Y) and output per hour (X,), with 

both indexes to base 1992 = 100, in the business sector of the U.S. economy for the period 1960-1999, 

as well as the civilian unemployment rate (X,) for the same period. 

a. How would you decide whether it is wage compensation that determines labor productivity or the 
other way round? 4 

b. Develop a suitable model to test your conjecture in (a), providing the usual statistics. 

c. Do you think the unemployment rate has any effect on wage compensation, and if so, how would 
you take that into account? Show the necessary statistical analysis. 

In a test of Granger causality, Christopher Sims exploits the fact that the future cannot cause the 

present.* To decide whether a variable Y causes a variable X, Sims suggests estimating the following 

pair of equations: 


iA i=m i=p 
Y, =a +9 Bi Xi +9 whi + iX + Uy, (1) 
= i=l i=l 


i=n i=m Pp 
X, =02 +Y SXi tY Yei HY Yri + ure (2) 
i=—i i=l 


l 


"C. A. Sims, “Money, Income, and Causality,” American Economic Review, vol. 62, 1972, pp. 540-552. 
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Table 17.11 Investments, Sales, and Interest Rate, United States, 1960-1999 


Observation Investment Sales Interest Observation Investment Sales Interest 
1960 4.9 60,827 4.41 1980 69.6 327,233 11.94 
1961 5.2 61,159 4.35 1981 82.4 355,822 14.17 
1962 5.7 65,662 4.33 1982 88.9 347,625 13.79 
1963 6.5 68,995 4.26 1983 100.8 369,286 12.04 
1964 73 73,682 4.40 1984 121.7 410,124 PAA 
1965 8.5 ` 80,283 4.49 1985 130.8 422,583 11.37 
1966 10.6 87,187 Sas 1986 137.6 430,419 9,02 
1967 HE2 90,820 5.51 1987 141.9 457,735 9.38 
1968 11.9 96,685 6.18 1988 155.9 —. 497,157- SEZA] 
1969 14.6 105,690 7.03 1989 173.0 527,039 9.26 
1970 16.7 108,221 8.04 1990 176.1 545,909 Chey 
1971 17.3 116,895 739 1991 181.4 542,815 8.77 
1972 19.3 131,081 Th Pes | 1992 197.5 567,176 8.14 
1973 23.0 153,677 7.44 1993 215.0 595,628 7.22 
1974 26.8 177,912 8.57 1994 233.7 639,163 7.96 
1975 28.2 182,198 8.83 1995 262.0 684,982 7.59 
1976 32.4 204,150 8.43 1996 287.3 718,113 7.37 
1977 38.6 229,513 8.02 1997 325.2 753,445 7.26 
1978 48.3 260,320 8.73 1998 367.4 779,413 6.53 
1979 58.6 297,701 9.63 1999 433.0 833,079 7.04 

Notes: Investment = private fixed investment in information processing equipment and software, billions of dollars, seasonally adjusted. 
Sales = sales in total manufacturing and trade, millions of dollars, seasonally adjusted. 
Interest = Moody’s Aaa corporate bond rate, %. 
Source: Economic Report of the President, 2001, Tables B-18, B-57, and B-73. 
Table 17.12 Compensation, Productivity and Unemployment Rate, United States, 1960-1999 

Observation COMP PRODUCT UNRate Observation COMP PRODUCT  UNRate 
1960 60.0 48.8 S35 1980 89.5 80.4 7.1 
1961 61.8 50.6 6.7 1981 89.5 82.0 7.6 
1962 63.9 52.9 ~J 1982 90.9 81.7 9.7 
1963 65.4 55.0 5.7 1983 91.0 84.6 9.6 
1964 67.9 S79 52 1984 91.3 87.0 7.5 
1965 69.4 59.6 4.5 1985 927 | 88.7 7.2 
1966 71.9 62.0 3.8 1986 95.8 91.4 7.0 
1967 73.8 63.4 3.8 1987 96.3 91.9 6.2 
1968 76.3 65.4 3.6 1988 97.3 93.0 59 
1969 77.4 65.7 3.5 1989 95.9 93.9 5.3 
1970 78.9 67.0 4.9 1990 96.5 95.2 5.6 
1971 80.4 69.9 5.9 1991 97.5 96.3 6.8 
1972 82.7 72.2 5.6 1992 100.0 100.0 7-5 
1973 84.5 74.5 4.9 1993 99.9 100.5 6.9 
1974 83.5 73.2 5.6 1994 99.7 101.9 6.1 
1975 84.4 75.8 8.5 1995 99.3 102.6 5.6 
1976 86.8 78.5 Z7. 1996 99.7 105.4 5.4 
1977 87.9 79.8 TA _ 1997 100.4 107.6 4.9 
1978 89.5 80.7 6.1 1998 104.3 110.5 4.5 
1979 89.7 80.7 5.8 1999 107.3 114.0 4.2 


Notes: COMP = index of real compensation per hour (1992 = 100). 
PRODUCT = index of output per hour (1992 = 100). 
UNRate = civilian unemployment rate, %. 
Source: Economic Report of the President, 2001, Table B-49, p. 332. 
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These regressions include the lagged, current, and future, or lead, values of the regressors; terms such 
as X,,1, X49, etc., are called lead terms. 

If Y is to Granger-cause X, then there must be some relationship between Y and the lead, or future, 
values of X. Therefore, instead of testing that eB; = 0, we should test eA; = 0 in Eq. (1). If we reject this 
hypothesis, the causality then runs from Y to X, and not from X to Y, because the future cannot cause 
the present. Similar comments apply to Equation (2). 


Table 17.13 Macroeconomic Data for the Greek Economy, 1960-1995 


Year PC PDI Grossinv GNP LTI 


1960 107808 W9 29121 145458 8 
1961 115147 127599 31476 161802 8 
1962 120050 135007 34128 164674 8 
1963 126115 142128 35996 181534 8.25 
1964 137192 159649 43445 196586 9 
1965 147707 172756 49003 214922 9 
1966 157687 182366 50567 228040 9 
1967 167528 195611 49770 240791 9 
1968 179025 204470 60397 257226 8.75 
1969 190089 222638 71653 282168 8 
1970 206813 246819 70663 304420 8 
1971 217212 269249 80558 327723 8 
1972 232312 297266 92977 356886 8 
1973 250057 335522 100093 383916 9 
1974 251650 310231 74500 369325 11.83 
1975 266884 327521 74660 390000 ` 11.88 
1976 281066 350427 79750 415491 US 
1977 293928 366730 85950 431164 a” 
1978 310640 390189 91100 458675 13.46 
1979 318817 406857 99121 476048 16.71 
1980 319341 401942 92705 485108 21725 
1981 325851 419669 85750 -484259 21:33 
1982 338507 421716 84100 483879 20.5 
1983 339425 417930 83000 481198 20.5 we 
1984 345194 434696 78300 490881 20.5 
1985 358671 456576 82360 502258 205 
1986 361026 439654 77234 507199 20.5 
1987 365473 438454 73315 505713 21.82 
1988 378488 476345 79831 . 529460 22.89 
1989 394942 492334 87873 546572 23.26 
1990 403194 495939 96139 546982 27.62 
1991 412458 513173 91726 566586 29.45 
1992 420028 502520 93140 568582 28.71 
1993 420585 523066 91292 569724 l 28.56 
1994 426893 520728 93073 579846 27.44 
1995 433723 518407 98470 588691 23.05 


Note: All nominal data are expressed at constant market prices of year 1970 in millions of drachmas. Private disposable income is deflated 
by the consumption price deflator. 
Source: H. R. Seddighi, K. A. Lawler, and A. V. Katos, Econometrics: A Practical Approach, Routledge, London, 2000, p. 158. 
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To carry out the Sims test, we estimate Eg. (1) without the lead terms (call it restricted regression) 
and then estimate Eq. (1) with the lead terms (call it unrestricted regression). Then we carry out the 
F test as indicated in Equation (8.7.9). If the F statistic is significant (say, at the 5% level), then we 
conclude that it is Y that Granger-causes X. Similar comments apply to Equation (2). 

Which test do we choose—Granger or Sims? We can apply both tests.* The one factor that is in 
favor of the Granger test is that it uses fewer degrees of freedom because it does not use the lead 
terms. If the sample is not sufficiently large, we will have to use the Sims test cautiously. 

Refer to the data given in Exercise 12.34. For pedagogical purposes, apply the Sims test of causality 
to determine whether it is sales that causes plant expenditure or vice versa. Use the last four years’ 
data as the lead terms in your analysis. 

17.32. Table 17.13 gives some macroeconomic data for the Greek economy for the years 1960-1995. 

Consider the following consumption function: 


In PCY = B; + Bo ln PDI, + A3LTI; + u; 


where PC; = real desired private consumption expenditure at time t; PDI, = real private disposable 
income at time ż; LTI, = long-term interest rate at time r; and In stands for natural logarithm. 

a. From the data given in Table 17.13, estimate the previous consumption function, stating clearly how 
you measured the real desired private consumption expenditure. 

b. What econometric problems did you encounter in estimating the preceding consumption function? 
How did you resolve them? Explain fully. 

17.33. Using the data in Table 17.13, develop a suitable model to explain the behavior of gross real investment 

in the Greek economy for the period 1960-1995. Look up any textbook on macroeconomics for the 
accelerator model of investment. 


“The choice between Granger and Sims causality tests is not clear. For further discussion of these tests, see G. Chamberlain, 


“The General Equivalence of Granger and Sims Causality,” Econometrica, vol. 50, 1982, pp. 569-582. 


Key to Multiple Choice Questions 


1. (b) 2. (a) 3. (a) 4. (c) 5. (d) 6. (d) 7. (a) 8. (b) ` 9. (c) 
10. (a) 11. (c) 12. (b) 13. (a) 14. (d) 15. (d) 16. (c) 17. (b) 18. (a) 
19. (b) 20u (b) Weis (b) ~~ 22; (c) 23. (b) 24. (b) 25. (b) 


Appendix I7A 


I7A.I The Sargan Test for the Validity of Instruments 


Suppose we use an instrumental variable(s) to replace an explanatory variable(s) that is correlated with the error term. 
How valid is the instrumental variable(s), that is, how do we know that the instruments chosen are independent of the 
error term? Sargan has developed a statistic, dubbed SARG, to test the validity of the instruments used in instrumental 
variable(s) (IV). The steps involved in SARG are as follows:** 


*]. D. Sargan, “Wages and Prices in the United Kingdom: A Study in Econometric Methodology,” in P. E. Hart, C. Mills, and 
J. K. Whitaker (eds.) Econometric Analysis for National Economic Planning, Butterworths, London, 1964. 

**The following discussion leans on H. R. Seddighi, K. A. Lawler, and A. V. Katos, Econometrics: A Practical Approach, Rout- 
ledge, New York, 2000, pp. 155-156. 
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Divide the variables included in a regression equation into two groups, those that are independent of the error term 
(say, X}, X>, .... X,) and those that are not independent of the error term (say, Z;, Z2, ..-, Z,). 


. Let W,, Wz, ..., W, be the instruments chosen for the Z variables in 1, where s > q. 


Estimate the original regression, replacing the Z’s by the W’s, that is, estimate the original regression by IV and 


obtain the residuals, say, i. 
Regress ù on a constant, all the X variables and all the W variables but exclude all the Z variables. Obtain R? from 


this regression. 
Now compute the SARG statistic, defined as: 


SARG = (n — k)R? ~ x2, (17A.1.1) 


where n = the number of observations and k is the number of coefficients in the original regression 
equation. Under the null hypothesis that the instruments are exogenous, Sargan has shown the SARG 
test asymptotically has the y? distribution with (s — q) degrees of freedom, where s is the number of 
instruments (i.e., the variables in W) and q is the number of regressors in the original equation. If the 
computed chi-square value in an application is statistically significant, we reject the validity of the 
instruments. If it is not statistically significant, we can accept the chosen instrument as valid. It should 
be emphasized that s > q, that is, the number of instruments must be greater than q. If that is not the 
case (i.e., s = q), the SARG test is not valid. 


. The null hypothesis is that all (W) instruments are valid. If the computed chi-square exceeds the critical 


chi-square value, we reject the null hypothesis, which means that at least one instrument is correlated 
with the error term and therefore the [V estimates based on the chosen instruments are not valid. 


SIMULTANEOUS-EQUATION 
MODELS AND TIME SERIES 
ECONOMETRICS 


A casual look at the published empirical work in business and economics will reveal that many economic 
relationships are of the single-equation type. That is why we devoted the first three parts of this book to the 
discussion of single-equation regression models. In such models, one variable (the dependent variable Y) 
is expressed as a linear function of one or more other variables (the explanatory variables, the X’s). In such 
models an implicit assumption is that the cause-and-effect relationship, if any, between Y and the X’s is unidi- 
rectional: The explanatory variables are the cause and the dependent variable is the effect. 

However, there are situations where there is a two-way flow of influence among economic variables; that 
is, one economic variable affects another economic variable(s) and is, in turn, affected by it (them). Thus, in 
the regression of money M on the rate of interest r, the single-equation methodology assumes implicitly that 
the rate of interest is fixed (say, by the Federal Reserve System) and tries to find out the response of money 
demanded to the changes in the level of the interest rate. But what happens if the rate of interest depends on 
the demand for money? In this case, the conditional regression analysis made in this book thus far may not 
be appropriate because now M depends on r and r depends on M. Thus, we need to consider two equations, 
one relating M to r and another relating r to M. And this leads us to consider simultaneous-equation models, 
models in which there is more than one regression equation, one for each interdependent variable. 

In Part 4 we present a very elementary and often heuristic introduction to the complex subject of simul- 
taneous-equation models, the details being left for the references. 

In Chapter 18, we provide several examples of simultaneous-equation models and show why the method 
of ordinary least squares considered previously is generally inapplicable in estimating the parameters of each 
of the equations in the model. 
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In Chapter 19, we consider the so-called identification problem. If in a system of simultaneous equations 
containing two or more equations it is not possible to obtain numerical values of each parameter in each 
equation because the equations are observationally indistinguishable, or look too much like one another, then 
we have the identification problem. Thus, in the regression of quantity Q on price P, is the resulting equation 
a demand function or a supply function (for Q and P enter into both functions)? Therefore, if we have data 
on Q and P only and no other information, it will be difficult if not impossible to identify the regression 
as the demand or supply function. It is essential to resolve the identification problem before we proceed to 
estimation because if we do not know what we are estimating, estimation per se is meaningless. In Chapter 
19 we offer various methods of solving the identification problem. 

In Chapter 20, we consider several estimation methods that are designed specifically for estimating the 
simultaneous-equation models and consider their merits and limitations. 


CHAPTER 


18 


Simultaneous-Equation 
Models 


In this and the following two chapters we discuss the simultaneous-equation models. In particular, we discuss 
their special features, their estimation, and some of the statistical problems associated with them. 


18.1 The Nature of Simultaneous-Equation Models 


In Parts 1 to 3 of this text we were concerned exclusively with single-equation models, i.e., models in which 
there was a single dependent variable Y and one or more explanatory variables, the X’s. In such models the 
emphasis was on estimating and/or predicting the average value of Y conditional upon the fixed values of 
the X variables. The cause-and-effect relationship, if any, in such models therefore ran from the X's to the Y. 

But in many situations, such a one-way or unidirectional cause-and-effect relationship is not meaningful. 
This occurs if Y is determined by the X’s, and some of the X’s are, in turn, determined by Y. In short, there is a 
two-way, or simultaneous, relationship between Y and (some of) the X’s, which makes the distinction between 
dependent and explanatory variables of dubious value. It is better to lump together a set of variables that can 
be determined simultaneously by the remaining set of variables—precisely what is done in simultaneous- 
equation models. In such models there is more than one equation—one for each of the mutually, or jointly, 
dependent or endogenous variables.! And unlike the single-equation models, in the simultaneous-equation 
models one may not estimate the parameters of a single equation without taking into account information 
provided by other equations in the system. 

What happens if the parameters of each equation are estimated by applying, say, the method of ordinary 
least squares (OLS), disregarding other equations in the system? Recall that one of the crucial assumptions 
of the method of OLS is that the explanatory X variables are either nonstochastic or, if stochastic (random), 
distributed independently of the stochastic disturbance term. If neither of these conditions is met, then, as 
shown later, the least-squares estimators are not only biased but also inconsistent; that is, as the sample size 


lin the context of the simultaneous-equation models, the jointly dependent variables are called endogenous variables 
and the variables that are truly nonstochastic or can be so regarded are called the exogenous, or predetermined, 
variables. (More on this in Chapter 19.) 
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increases indefinitely, the estimators do not converge to their true (population) values. Thus, in the following 
hypothetical system of equations,” 


Yi; = Bio + Bi2¥Yai + 1X + tii (18.1.1) 


Yo; = Boo + Bar Yui + yor Xai + Uzi (18.1.2) 


where Y) and Y, are mutually dependent, or endogenous, variables and X} is an exogenous variable and where 
u and u, are the stochastic disturbance terms, the variables Y, and Y, are both stochastic. Therefore, unless 
it can be shown that the stochastic explanatory variable Y, in Eq. (18.1.1) is distributed independently of u, 
and the stochastic explanatory variable Y, in Eq. (18.1.2) is distributed independently of u», application of the 
classical OLS to these equations individually will lead to inconsistent estimates. 

In the remainder of this chapter we give a few examples of simultaneous-equation models and show 
the bias involved in the direct application of the least-squares method to such models. After discussing 
the so-called identification problem in Chapter 19, in Chapter 20 we discuss some of the special methods 
developed to handle the simultaneous-equation models. 


18.2 Examples of Simultaneous-Equation Models 


Example 18.1 Demand-and-Supply Model 


As is well known, the price P of a commodity and the quantity Q sold are determined by the intersection of the 
demand-and-supply curves for that commodity. Thus, assuming for simplicity that the demand-and-supply 
curves are linear and adding the stochastic disturbance terms u and uz, we may write the empirical demand- 
and-supply functions as: 


Demand function: Q! = oo +1 P+ uit a1 <0 (18.2.1) 
Supply function: Q? = Bo + Bi Pe + Uzt Bi > 0 (18.2.2) 


Equilibrium function: Qf = Q% 


where Q= quantity demanded 
Q’ = quantity supplied 
t = time {x 
and the a’s and g's are the parameters. A priori, a, is expected to be negative (downward-sloping demand 
curve), and £; is expected to be positive (upward-sloping supply curve). 

Now it is not too difficult to see that P and Q are jointly dependent variables. If, for example, u; in Eq. 
(18.2.1) changes because of changes in other variables affecting OF (such as income, wealth, and tastes), 
the demand curve will shift upward if u,, is positive and downward if u,, is negative. These shifts are shown 
in Figure 18.1. 

As the figure shows, a shift in the demand curve changes both P and Q. Similarly, a change in up, (because 
of strikes, weather, import or export restrictions, etc.) will shift the supply curve, again affecting both P and 
Q. Because of this simultaneous dependence between Q and P, u,, and P, in Eq. (18.2.1) and u, and P, in 
Eq. (18.2.2) cannot be independent. Therefore, a regression of Q on P as in Eq. (18.2.1) would violate an 
important assumption of the classical linear regression model, namely, the assumption of no correlation 
between the explanatory variable(s) and the disturbance term. 


These economical but self-explanatory notations will be generalized to more than two equations in Chapter 19. 
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Figure 18.1 Interdependence of price and quantity. 


Example 18.2 Keynesian Model of Income Determination 


Consider the simple Keynesian model of income determination: 


Consumption function: Ce = Bo + Bile + ut 0< fi <1 


Income identity: Y¥,;= Cet he (= SA 
where C = consumption expenditure 
Y = income 
I = investment (assumed exogenous) 
S = savings 
t = time 


u = stochastic disturbance term 
By and B, = parameters 


(18.2.3) 
(18.2.4) 


The parameter $; is known as the marginal propensity to consume (MPC) (the amount of extra consumption 
expenditure resulting from an extra dollar of income). From economic theory, $4 is expected to lie between 
0 and 1. Equation (18.2.3) is the (stochastic) consumption function; and Eq. (18.2.4) is the national income 
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identity, signifying that total income is equal to total consumption expenditure plus total investment expen- 
diture, it being understood that total investment expenditure is equal to total savings. Diagrammatically, we 
have Figure 18.2. 

From the postulated consumption function and Figure 18.2 it is clear that C and Y are interdependent 
and that Y, in Eq. (18.2.3) is not expected to be independent of the disturbance term because when u, shifts 
(because of a variety of factors subsumed in the error term), then the consumption function also shifts, which, 
in turn, affects Y, Therefore, once again the classical least-squares method is inapplicable to Eq. (18.2.3). If 
applied, the estimators thus obtained will be inconsistent, as we shall show later. 


Gi 


Consumption, investment 


0 National income 
Figure 18.2 Keynesian model of income determination. 


Example 18.3 Wage-Price Models 


Consider the following Phillips-type model of money-wage and price determination: 


v 


Wt = ao + a UN; + @2 Pe + Ue (18.2.5) 
Pt = Bo + Bi Wi + B2Rt + B3 Mi + ure . (18.2.6) 
where W =rate of change of money wages 


UN = unemployment rate, % 
P = rate of change of prices 
R = rate of change of cost of capital 
M = rate of change of price of imported raw material 
t = time 
u4, U2 = Stochastic disturbances 


Since the price variable P enters into the wage equation and the wage variable W enters into the price 
equation, the two variables are jointly dependent. Therefore, these stochastic explanatory variables are 
expected to be correlated with the relevant stochastic disturbances, once again rendering the classical OLS 
method inapplicable to estimate the parameters of the two equations individually. 


LS 
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Example 18.4 The IS Model of Macroeconomics 


The celebrated IS, or goods market equilibrium, model of macroeconomics? in its non-stochastic form can be 
expressed as: 


Consumption function: C+ = Bo + Bi Yat 0 < pi <1 (18.2.7) 
Tax function: Te = æo + &1Yt 0 <a, <1 (18.2.8) 
Investment function: l= y+ yirt (18.2.9) 
Definition: Yat = Yr — Th (18.2.10) 
Government expenditure: G,=G (18.2.11) 
National income identity: Y,= Cet+ le + Ge (18.2.12) 


where Y= national income 
C = consumption spending 
|= planned or desired net investment 
G = given level of government expenditure 
T = taxes 
Y, = disposable income 
r = interest rate 
if you substitute Eqs. (18.2.10) and (18.2.8) into Eq. (18.2.7) and substitute the resulting equation for C 
and Eqs. (18.2.9) and (18.2.11) into Eq. (18.2.12), you should obtain the IS equation: 


Ye = mo + mrt (18.2.13) 
where 
_ Bo—aobi +yo+G 
a T= ei 
i (18.2.14) 
m = —— 
T= Bil — a1) 


Equation (18.2.13) is the equation of the IS, or goods market equilibrium, that is, it gives the combinations 
of the interest rate and level of income such that the goods market clears or is in equilibrium. Geometrically, 
the IS curve is shown in Figure 18.3. 


r 


Interest rate 


Income 
Figure 18.3 The IS curve. 
3”The goods market equilibrium schedule, or IS schedule, shows combinations of interest rates and levels of output such 


that planned spending equals income.” See Rudiger Dornbusch and Stanley Fischer, Macroeconomics, 3d ed., McGraw-Hill, 
New York, 1984, p. 102. Note that for simplicity we have assumed away the foreign trade sector. 
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What would happen if we were to estimate, say, the consumption function (18.2.7) in isolation? Could 
we obtain unbiased and/or consistent estimates of By and 84? Such a result is unlikely because consumption 
depends on disposable income, which depends on national income Y, but the latter depends on r and G as 
well as the other parameters entering in mo. Therefore, unless we take into account all these influences, a 
simple regression of C on Y, is bound to give biased and/or inconsistent estimates of By and £4. 


Example 18.5 The LM Model 


The other half of the famous IS-LM paradigm is the LM, or money market equilibrium, relation, which gives 
the combinations of the interest rate and level of income such that the money market is cleared, that is, 
the demand for money is equal to its supply. Algebraically, the model, in the nonstochastic form, may be 
expressed as: 


Money demand function: | M? = a + bY; —cr; (18.2.15) 
Money supply function: Mi=M (18.2.16) 
Equilibrium condition: Mf = M? (18.2.17) 


where Y = income, r= interest rate, and M = assumed level of money supply, say, determined by the Fed. 
Equating the money demand and supply functions and simplifying, we obtain the LM equation: 


Ye = ào + À M+ A2re (18.2.18) 
where 
ào = —a/b 
Ai =1/b (18.2.19) 
ECD 


For a given M = M, the LM curve representing the relation (18.2.18) is as shown in Figure 18.4. 

The IS and LM curves show, respectively, that a whole array of interest rates is consistent with goods market 
equilibrium and a whole array of interest rates is compatible with equilibrium in the money market. Of course, 
only one interest rate and one level of income will be consistent simultaneously with the two equilibria. To 
obtain these, all that needs to be done is to equate Eqs. (18.2.13) and (18.2.18). In Exercise 18.4 you are 
asked to show the level of the interest rate and income that is simultaneously compatible with the goods and 
money market equilibrium. 


LM(M = M) x 


Interest rate 


Income 
Figure 18.4 The LM curve. 
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Example 18.6 Econometric Models 


An extensive use of simultaneous-equation models has been made in the econometric models built by several 
econometricians. An early pioneer in this field was Professor Lawrence Klein of the Wharton School of the 
University of Pennsylvania. His initial model, known as Klein’s model l, is as follows: 


Consumption function: Ct = Bo + Bi Pr + Bo(W + Whi + B3 Pe-1 + Unt 


Investment function: lt = Ba + Bs Pt + Bo Pr_1 + B7Kt-1 + uy 
Demand for labor: W: = Bg + Bo(Y + T — W’); 
+ Bio(Y + T — W’):-1 + Birt + ust (18.2.20) 
Identity: Y%Y+h=C+h+G: 
Identity: Y= Wi + We + Pr 
Identity: Ke= Keath 
where C = consumption expenditure 


| = investment expenditure 
G = government expenditure 
P = profits 
W = private wage bill 
W' = government wage bill 
K = capital stock 


T = taxes 
Y = income after tax 
t =time 


ü}, U2, and u, = stochastic disturbances.* 

In the preceding model the variables C, |, W, Y, P, and K are treated as jointly dependent, or endogenous, 
variables and the variables P,_,, Kı, and Y;,_, are treated as predetermined.° In all, there are six equations 
(including the three identities) to study the interdependence of six endogenous variables. 

In Chapter 20 we shall see how such econometric models are estimated. For the time being, note that 
because of the interdependence among the endogenous variables, in general they are not independent of 
the stochastic disturbance terms, which therefore makes it inappropriate to apply the method of OLS to an 
individual equation in the system. As shown in Section 18.3, the estimators thus obtained are inconsistent; 
they do not converge to their true population values even when the sample size is very large. 


18.3 The Simultaneous-Equation Bias: Inconsistency of OLS Estimators 


As stated previously, the method of least squares may not be applied to estimate a single equation embedded 
in a system of simultaneous equations if one or more of the explanatory variables are correlated with the 
disturbance term in that equation because the estimators thus obtained are inconsistent. To show this, let 
us revert to the simple Keynesian model of income determination given in Example 18.2. Suppose that we 
want to estimate the parameters of the consumption function (18.2.3). Assuming that E(u,) = 0, E(u?) = 0°, 
E(u,u,, ;) =0 (forj + 0), and cov (J,, u,) = 0, which are the assumptions of the classical linear regression model, 
we first show that Y, and u, in (18. 2. 3) are correlated and then prove that A; is an inconsistent estimator of Bi 


4L. R. Klein, Economic Fluctuations in the United States, 1921-1941, john Wiley & Sons, New York, 1950. 


SThe model builder will have to specify which of the variables in a model are endogenous and which are predetermined. 
K,_, and Y,_, are predetermined because at time t their values are known. (More on this in Chapter 19.) 
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To prove that Y, and u, are correlated, we proceed as follows. Substitute Eq. (18.2.3) into Eq. (18.2.4) to 
obtain 


\ 


Y, = Po + hi + ur + 1 


that is, 
Bo 1 1 
Me = + —— I, + ———-u (18.3.1) 
i TS8, 16 a F 
Now 
Bo 1 
er I (18.3.2) 
(Y:) a ee 


where use is made of the fact that E(u,) = 0 and that /, being exogenous, or predetermined (because it is fixed 
in advance), has as its expected value Z, 
Therefore, subtracting Eq. (18.3.2) from Eq. (18.3.1) results in 


et 2 als 18.3.3 
n- EY) = iii 
Moreover, 
u, — E(u) = u: (Why?) (18.3.4) 
whence 
cov (Y,, u;) = E[Y, — E(¥,) [ur — E(uz)] 
2 
— Z - = from Eqs. (18.3.3) and (18.3.4) (18.3.5) 
E7 i 
2 
~ 1-8 


Since a” is positive by assumption (why?), the covariance between Y and u given in Eq. (18.3.5) is bound 
to be different from zero.° As a result, Y, and u, in Eq. (18.2.3) are expected to be correlated, which violates 
the assumption of the classical linear regression model that the disturbances are independent or at least 
uncorrelated with the explanatory variables. As noted previously, the OLS estimators in this situation are 
inconsistent. ~ 


To show that the OLS estimator Ê | 1S an inconsistent estimator of B, because of correlation between Y, and 
u,, we proceed as follows: 


YC, — CY, — Y) 
ROK Or 
adits 
Ly 
> Cy, 
Ly 


ĝi = 
(18.3.6) 
Sit will be greater than zero as long as B,, the MPC, lies between 0 and 1, and it will be negative if 8; is greater than unity. 


Of course, a value of MPC greater than unity would not make much economic sense. In reality therefore the covariance 
between Y, and u, is expected to be positive. 
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where the lowercase letters, as usual, indicate deviations from the (sample) mean values. Substituting for C, 
from Eq. (18.2.3), we obtain 


4 _ (Bo + Bi¥: + ur)yı 


By = 2 
Dy (18.3.7) 
pym 
n yi 
where in the last step use is made of the fact that )> y; = 0 and (> ¥,3,/3- y?) = 1 (why?) 
If we take the expectation of Eq. (18.3.7) on both sides, we obtain 

7 u 
E(B) = fı +E |a (18.3.8) 

Dy; 


Unfortunately, we cannot evaluate E(>~ yru, / X y2) since the expectations operator is a linear operator. 
[Note: E(A/B) # E(A)/E(B).] But intuitively it should be clear that unless the term (>< vit: / > y2) is zero, 
pı is a biased estimator of B,. But have we not shown in Eq. (18.3.5) that the covariance between Y and u 
is nonzero and therefore would Â; not be biased? The answer is, not quite, since cov (Y,, u,), a population 
concept, is not quite $` y,u,, which is a sample measure, although as the sample size increases indefinitely 
the latter will tend toward the former. But if the sample size increases indefinitely, then we can resort to the 
concept of consistent estimator and find out what happens to ĝ; as n, the sample size, increases indefinitely. 
In short, when we cannot explicitly evaluate the expected value of an estimator, as in Eq. (18.3.8), we can turn 
our attention to its behavior in the large sample. 

Now an estimator is said to be consistent if its probability limit,’ or plim for short, is equal to its true 
(population) value. Therefore, to show that B; of Eq. (18.3.7) is inconsistent, we must show that its plim is 
not equal to the true 8,. Applying the rules of probability limit to Eq. (18.3.7), we obtain:® 


= plim (£1) + plim (z (18.3.9) 
_ 2, Plim (Xo yu /n) 
= Pi + him (Sy? /n) 


where in the second step we have divided Y` yu, and )~ y? by the total number of observations in the sample 
n so that the quantities in the parentheses are now the sample covariance between Y and u and the sample 
variance of Y, respectively. 

In words, Eq. (18.3.9) states that the probability limit of Bi is equal to true £} plus the ratio of the plim 
of the sample covariance between Y and u to the plim of the sample variance of Y. Now as the sample size 
n increases indefinitely, one would expect the sample covariance between Y and u to approximate the true 
population covariance E[Y, — E(Y,)|[u, — E(u,)], which from Eq. (18.3.5) is equal to [ o°A1 — B,)]. Similarly, 


plim (,) = plim(6;) + plim ( 


7See Appendix A for the definition of probability limit. 
8As stated in Appendix A, the plim of a constant (for example, 8) is the same constant and the plim of (A/B) = plim (A)/ 
plim (B). Note, however, that E(A/B) # E(A)/E(B). 
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as n tends to infinity, the sample variance of Y will approximate its population variance, say oj. Therefore, 
Eq. (18.3.9) may be written as 


eo « 2 i 
plim (Ai) = i + EAR 
l y (18.3.10) 


ee 
= Pica = 


Given that 0 < B, < 1 and that ø? and a are both positive, it is obvious from Eq. (18.3.10) that plim (ÊD 
will always be greater than B,; that is, Bi will overestimate the true B? In other words, f; is a biased 
estimator, and the bias will not disappear no matter how large the sample size. 


18.4 The Simultaneous-Equation Bias: A Numerical Example 


To demonstrate some of the points made in the preceding section, let us return to the simple Keynesian model 
of income determination given in Example 18.2 and carry out the following Monte Carlo study.!° Assume 
that the values of investment / are as shown in column 3 of Table 18.1. Further assume that 


E(u;) =") 
E(uyur4j) = 90 (J #9) 
var (u;) = o? = 0.04 
cov (us, l) = 0 


The u, thus generated are shown in column 4. 

For the consumption function (18.2.3) assume that the values of the true parameters are known and are Bp 
=2 and B, = 0.8. 

From the assumed values of 6, and £; and the generated values of u, we can generate the values of income 
Y, from Eq. (18.3.1), which are shown in column 1 of Table 18.1. Once Y, are known, and knowing Bp, B, 
and u,, one can easily generate the values of consumption C, from Eq. (18.2.3). The C’s thus generated are 
given in column 2. 

Since the true ù and B, are known, and since our sample errors are exactly the same as the “true” errors 
(because of the way we designed the Monte Carlo study), if we use the data of Table 18.1 to regress C,on Y, 
we should obtain By = 2 and B, = 0.8, if OLS were unbiased. But from Eq. (18.3.7) we know that this will not 
be the case if the regressor Y, and the disturbance u, are correlated. Now it is not too difficult to verify from 
our data that the (sample) covariance between Y, and u,is )~ ypu, = 3.8 and that X` y? = 184. Then, as Eq. 
(18.3.7) shows, we should have 


3.8 ; (18.4.1) 


= 0.82065 
That is, 6, is upward-biased by 0.02065. 


?In general, however, the direction of the bias will depend on the structure of the particular model and the true values of 
the regression coefficients. 


This is borrowed from Kenneth J. White, Nancy G. Horsman, and Justin B. Wyatt, SHAZAM: Computer Handbook for 
Econometrics for Use with Basic Econometrics, McGraw-Hill, New York, 1985, pp. 131-134. 


Simultaneous-Equation Models 719 


Table 18.1 
Yı C: li Ur 
(1) (2) (3) (4) 
18.15697 16.15697 2.0 —0.3686055 
19.59980 17.59980 2.0 —0.8004084E-01 
21.93468 ' 19.73468 22 0.1869357 
21.55145 19.35145 272 0.1102906 
21.88427 19.48427 2.4 —0.2314535E-01 
22.42648 20.02648 2.4 0.8529544E-01 
25.40940 22.80940 2.6 0.4818807 
22.69523 20.09523 2.6 —0.6095481E-01 
24.36465 21.56465 2.8 0.7292983E-01 
24.39334 21.59334 2.8 0.786681 9E-01 
24.09215 21.09215 3.0 —0.1815703 
24.87450 21.87450 3.0 —0.2509900E-01 
25.31580 22.11580 32 : —0.1368398 
26.30465 23.10465 32 0.6092946E-01 
25.78235 22.38235 3.4 —0.2435298 
26.08018 22.68018 3.4 —0.1839638 
27.24440 23.64440 3.6 —0.1511200 
28.00963 24.40963 3.6 0.1926739E-02 
30.89301 27.09301 3.8 0.3786015 


28.98706 25.18706 3.8 —0.2588852E-02 


Source: Kenneth J. White, Nancy G. Horsman, and Justin B. Wyatt, SHAZAM: Computer Handbook for Econometrics for Use 
with Damodar Gujarati: Basic Econometrics, September 1985, p. 132. 


Now let us regress C, on Y, using the data given in Table 18.1. The regression results are 


C,= 1.4940 + 0.82065Y, 
se = (0.35413) (0.01434) (18.4.2) 
t= (4.2188) (57.209) R? = 0.9945 


As expected, the estimated 8, is precisely the one predicted by Eq. (18.4.1). In passing, note that the estimated 
Bo too is biased. 

In general, the amount of the bias in Â) depends on Bi g? and var (Y) and, in particular, on the degree of 
covariance between Y and u.!! As Kenneth White et al. note, “This is what simultaneous equation bias is all 
about. In contrast to single equation models, we can no longer assume that variables on the right hand side of 
the equation are uncorrelated with the error term.”' Bear in mind that this bias remains even in large samples. 

In view of the potentially serious consequences of applying OLS in simultaneous-equation models, is 
there a test of simultaneity that can tell us whether in a given instance we have the simultaneity problem? 
One version of the Hausman specification test can be used for this purpose, which we discuss in Chapter 19. 


"See Eq. (18.3.5). 
120p. cit., pp. 133-134. 
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Summary and Conclusions 


1. In contrast to single-equation models, in simultaneous-equation models more than one dependent, 
or endogenous, variable is involved, necessitating as many equations as the number of endogenous 
variables. 

2. A unique feature of simultaneous-equation models is that the endogenous variable (1.e., regressand) in 
one equation may appear as an explanatory variable (i.e., regressor) in another equation of the system. 

3. As a consequence, such an endogenous explanatory variable becomes stochastic and is usually corre- 
lated with the disturbance term of the equation in which it appears as an explanatory variable. 

4. In this situation the classical OLS method may not be applied because the estimators thus obtained 
are not consistent, that is, they do not converge to their true population values no matter how large the 
sample size. 

5. The Monte Carlo example presented in the text shows the nature of the bias involved in applying 
OLS to estimate the parameters of a regression equation in which the regressor is correlated with the 
disturbance term, which is typically the case in simultaneous-equation models. 

6. Since simultaneous-equation models are used frequently, especially in econometric models, alternative 
estimating techniques have been developed by various authors. These are discussed in Chapter 20, 
after the topic of the identification problem is considered in Chapter 19, a topic logically prior to 
estimation. 


Multiple Choice Questions 


1. Applying OLS to simultaneous equations results in the parameters being 
a. Inefficient 
b. Inconsistent 
c. Biased 
d. Biased and inconsistent 
2. The simultaneous equation bias 
a. Refers to the bias of the researcher towards using this model 
b. Is the bias in the estimated parameters that disappers when sample size becomes large 
c. Is the bias in the estimated parameters that do not disappers even when sample size becomes large 
d. Means that the error terms are biased positively in small samples 
3. In simultaneous equation model, the number of equations to be estimated is 
a. One more than the number of endogenous variables 
b. Equal to the number of endogenous variables 
c. Depend on the underlying economic theory 
d. Equal to the number of endogenous and exogenous variables 
4. In contrast to single-equation models, in simultaneous-equation model there must be more than one 
Endogenous variable 
Exogenous variable 
Parameter to be estimated 
Equation to be estimated 


xaos 
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5. In simultaneous equation model, the endogenous variable in one equation may appear as 
a. Regressand in other equation 
b. Regressor in other equation 
c. Parameters in other equation 
d. Dependent variable in other equation 
Exercises 
Questions 
18.1. Develop a simultaneous-equation model for the supply of and demand for dentists in the United 
States. Specify the endogenous and exogenous variables in the model. 
18.2. Develop a simple model of the demand for and supply of money in the United States and compare 
your model with those developed by K. Brunner and A. H. Meltzer’ and R. Tiegen. 
18.3. a. For the demand-and-supply model of Example 18.1, obtain the expression for the probability limit 
of a z 
b. Under what conditions will this probability limit be equal to the true a,? 
18.4. For the IS-LM model discussed in the text, find the level of interest rate and income that is simultane- 
ously compatible with the goods and money market equilibrium. 
18.5. To study the relationship between inflation and yield on common stock, Bruno Oudet’ used the 
following model: 
Roe = 1 + O2Rop +.003Rop—1 + gl, +5; + aeNIS, + 7h, + uir 
Rs, = By + Bo Ror + B3Ror—-1 + Bal: + Bs¥; + BoNIS; + Br Ey + uz 
where L = real per capita monetary base 
Y = real per capita income 
I = the expected rate of inflation 
NIS = a new issue variable : 
E = expected end-of-period stock returns, proxied by lagged stock price ratios 
R,, = bond yield 
R,, = common stock returns 
a. Offer a theoretical justification for this model and see if your reasoning agrees with that of Oudet. 
b. Which are the endogenous variables in the model? Which are the exogenous variables? 
c. How would you treat the lagged R,,—endogenous or exogenous? 
18.6. In their article, “A Model of the Distribution of Branded Personal Products in Jamaica,” * John U. 


Farley and Harold J. Levitt developed the following model (the personal products considered were 
shaving cream, skin cream, sanitary napkins, and toothpaste): 

Yy = œ + Bi Yo; + Bo Ys: + B3 Yai + Uii 

Yo; = a2 + P4 Yii + Bs Ysi + Xi + 2X21 + uzi 


*“Some Further Evidence on Supply and Demand Functions for Money,” journal of Finance, vol. 19, May 1964, pp. 240-283. 
“Demand and Supply Functions for Money in the United States,” Econometrica, vol. 32, no. 4, October 1964, pp. 476-509. 
tBruno A. Oudet, “The Variation of the Return on Stocks in Periods of Inflation,” Journal of Financial and Quantitative Analysis, 
vol. 8, no. 2, March 1973, pp. 247-258. 

tJournal of Marketing Research, November 1968, pp. 362-368. 
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Y3; = 03 + Bo Yo; + 3X3; + U3; 
Ya; = 4 + By Yo; + yaXai + U4i 
Ys; = os + Bg Yo; + bolor + Bio Yai.+ us: 


where Y, = percent of stores stocking the product 
Y, = sales in units per month 
Y, = index of direct contact with importer and manufacturer for the product 
Y, = index of wholesale activity in the area 
Y; = index of depth of brand stocking for the product (i.e., average number of brands of the 
product stockéd by stores carrying the product) 
X, = target population for the product 
X, = income per capita in the parish where the area is 
X, = distance from the population center of gravity to Kingston 
X, = distance from population center to nearest wholesale town 
a. Can you identify the endogenous and exogenous variables in the preceding model? 
b. Can one or more equations in the model be estimated by the method of least squares? Why or why 
not? 
18.7. To study the relationship between advertising expenditure and sales of cigarettes, Frank Bass used the 
following model:” 
Yip = of + Bi Ys: + BoYar + Vi Xe + YoX2 + Uy 
Yo, = œz + P3 Yst + BaYar + ¥3X11 + YaX2e + üz 
Y3, = a3 + Bs Yur + Bo Yo: + uz 
Yay = Q4 + Br Vir + Bg Yor + war 
where Y, = logarithm of sales of filter cigarettes (number of cigarettes) divided by population over 
age 20 
Y, = logarithm of sales of nonfilter cigarettes (number of cigarettes) divided by population 
over age 20 
Y, = logarithm of advertising dollars for filter cigarettes divided by population over age 20 
divided by advertising price index 
Y, = logarithm of advertising dollars for nonfilter cigarettes divided by — i over age 20 
divided by advertising price index 
X, = logarithm of disposable personal income divided by population over age 20 divided by 
` consumer price index 
X, = logarithm of price per package of nonfilter cigarettes divided by consumer price index 
a. In the preceding model the Y’s are endogenous and the X°s are exogenous. Why does the author 
assume X, to be exogenous? 
b. If X, is treated as an endogenous variable, how would you modify the preceding model? 
18.8. G. Menges developed the following econometric model for the West German economy: 


= Bo + BiYi-1 + Bol; tury 
l= p3 + BaY, + BsQ; Hux 


"A Simultaneous Equation Regression Study of Advertising and Sales of Cigarettes,” Journal of Marketing Research, vol. 6, 
Anga 1969, pp. 291-300. 


“G. Menges, “Ein Okonometriches Modell der Bundesrepublik Deutschland (Vier Strukturgleichungen),” 1.F.O. Studien, 
vol. 5, 1959, pp. 1-22. 
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C; = Bo + BY; + BsC,_1 + BoP, + u3: 


O, = Bio + B11 Qr-1 + Bir Ry + ty 


where Y = national income 
I = net capital formation 
C = personal consumption 
Q = profits 
P = cost of living index 
R = industrial productivity 
t = time 
u = stochastic disturbances 
a. Which of the variables would you regard as endogenous and which as exogenous? 
b. Is there any equation in the system that can be estimated by the single-equation least-squares 
method? 
c. What is the reason behind including the variable P in the consumption function? 
18.9. L. E. Gallaway and P. E. Smith developed a simple model for the United States economy, which is as 


follows: 
Y, = C; +4.+ G; 
C, = Bi + Po YD;-1 + B3M; + uir 
I, = Ba + Bs(¥:-1 — Yr-2) + Be Zi- + uy 
G: = By + BsGy-1 + u3: 
where Y = gross national product 


C = personal consumption expenditure 
I = gross private domestic investment 
G = government expenditure plus net foreign investment 
YD = disposable, or after-tax, income 
M = money supply at the beginning of the quarter 
Z = property income before taxes 
t = time 
Uj, U>, and u, = stochastic disturbances 
All variables are measured in the first-difference form. 
From the quarterly data from 1948-1957, the authors applied the least-squares method to each 
equation individually and obtained the following results: 


C, = 0.09 + 0.43YD,_; + 0.23M; R? = 0.23 
I, = 0.08 + 0.43(Y,_1 — Y2) + 0.48Z, R? = 0.40 
G, = 0.13 + 067G R? = 0.42 


a. How would you justify the use of the single-equation least-squares method in this case? 
b. Why are the R? values rather low? 


“A Quarterly Econometric Model of the United States,” Journal of American Statistical Association, vol. 56, 1961, pp. 379- 
383. 
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Empirical Exercises 

18.10. Table 18.2 gives you data on Y (gross domestic product), 7 (gross private domestic investment), and 
C (personal consumption expenditure) for the United States for the period 1970-2006. All data are in 
1996 billions of dollars. Assume that C is linearly related to Y as in the simple Keynesian model of 
income determination of Example 18.2. Obtain OLS estimates of the parameters of the consumption 
function. Save the results for another look at the same data using the methods developed in Chapter 
20. 


Table 18.2 Personal Consumption Expenditure, Gross Private Domestic Investment, and GDP, United States, 
1970-2006 (billions of 1996 dollars) 


Observation Cc j Y Observation A I Y 
1970 2,451.9 427.1 3,771.9 1989 4,675.0 926.2 6,981.4 
1971 2,545.5 475.7 3,898.6 1990 4,770.3 895.1 7,112.5 
1972 2,701.3 532.1 4,105.0 1991 4,778.4 822.2 7,100.5 
1973 2,833.8 594.4 4,341.5 1992 4,934.8 889.0 7,336.6 
1974 2,812.3 550.6 4,319.6 1993 5,099.8 968.3 7,532.7 
1975 2,876.9 453.1 4,311.2 1994 5,290.7 1,099.6 7,835.5 
1976 3,035.5 544.7 4,540.9 1995 5,433.5 1,134.0 8,031.7 
1977 3,164.1 627.0 4,750.5 1996 5,619.4 1,234.3 8,328.9 
1978 3,303.1 702.6 5,015.0 1997 5,831.8 1,387.7 8,703.5 
1979 3,383.4 725.0 5,173.4 1998 6,125.8 1,524.1 9,066.9 
1980 3,374.1 645.3 5,161.7 1999 6,438.6 1,642.6 9,470.3 
1981 3,422.2 704.9 5,291.7 g 2000 6,739.4 1,735.5 9,817.0 
1982 3,470.3 606.0 5,189.3 2001 6,910.4 1,598.4 9,890.7 
1983 3,668.6 662.5 5,423.8 2002 - 7,099.3 UW SSv7/1. 10,048.8 
1984 3,863.3 857.7 5,813.6 2003 7,295.3 1,613.1 10,301.0 
1985 4,064.0 849.7 6,053.7 2004 7,561.4 1,770.2 10,675.8 
1986 4,228.9 843.9 6,263.6 2005 7,803.6 1,869.3 11,003.4 
1987 4,369.8 870.0 6,475.1 2006 8,044.1 1,919.5 11,319.4 
1988 4,546.9 890.5 6,742.7 


Notes: C = personal consumption expenditure. 
I = gross private domestic investment. 
Y = gross domestic product. 


Source: Economic Report of the President, 2008, Table B-2. 


18.11. Using the data given in Exercise 18.10, regress gross domestic investment J on GDP and save the 
results for further examination in a later chapter. 
18.12. Consider the macroeconomics identity 


CHS (= GDP) 
As before, assume that 
Ci = o + Bil, + uy 
and, following the accelerator model of macroeconomics, let 
lı =a +0(¥; — %1) + vy 


where u and v are error terms. From the data given in Exercise 18.10, estimate the accelerator model 
and save the results for further study. 
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18.13. Supply and demand for gas. Table 18.3, found on the textbook website, gives data on some of 
the variables that determine demand for and supply of gasoline in the U.S. from January 1978 to 
August 2002.” The variables are: pricegas (cents per gallon); quantgas (thousands of barrels per day, 
unleaded); persincome (personal income, billions of dollars); and car sales (millions of cars per year). 
a. Develop a suitable supply-and-demand model for gasoline consumption. 

b. Which variables in the model in (a) are endogenous and which are exogenous? 

c. If you estimate the demand-and-supply functions that you have developed by OLS, will your 
results be reliable? Why or why not? 

d. Save the OLS estimates of your demand-and-supply functions for another look after we discuss 
Chapter 20. 

18.14. Table 18.4, found on the textbook website, gives macroeconomic data on several variables for the 
U.S. economy for the quarterly periods 195 1-I to 2000-IV.** The variables are as follows: Year = 
date; Qtr = quarter: Realgdp = real GDP (billions of dollars); Realcons = real consumption expen- 
diture; Realinvs = real investment by private sector; Realgovt = real government expenditure; Realdpi 
= real disposable personal income; CP/_U = consumer price index; M1 = nominal money stock; 
Tbilrate = quarterly average of month-end 90-day T-bill rate; Pop = population, millions, interpolate 
of year-end figures using constant growth rate per quarter; Infl = rate of inflation (first observation is 
missing); and Realint = expost real interest rate = Tbilrate—Infi (first observation missing). 

Using these data, develop a simple macroeconomic model of the U.S. economy. You will be asked 
to estimate this model in Chapter 20. 


Key to Multiple Choice Questions 


1. (d) Z. (C) 3. (b) 4. (a) SAD) 


“These data are taken from the website of Stephen J. Schmidt, Econometrics, McGraw-Hill, New York, 2005. See www.mhhe. 
com/economics. l 
“These data are originally from the Department of Commerce, Bureau of Economic Analysis, and from www.economagic. 
com, and are reproduced from William H. Greene, Econometric Analysis, 6th ed., 2008, Table F5.1, p.1083. 


CHAPTER 


The Identification Problem 


In this chapter we consider the nature and significance of the identification problem. The crux of the identifi- 
cation problem is as follows: Recall the demand-and-supply model introduced in Section 18.2. Suppose that 
we have time series data on Q and P only and no additional information (such as income of the consumer, price 
prevailing in the previous period, and weather condition). The identification problem then consists in seeking 
an answer to this question: Given only the data on P and Q, how do we know whether we are estimating the 
demand function or the supply function? Alternatively, if we think we are fitting a demand function, how do 
we guarantee that it is, in fact, the demand function that we are estimating and not something else? 

A moment’s reflection will reveal that an answer to the preceding question is necessary before one proceeds 
to estimate the parameters of our demand function. In this chapter we sha show how the identification 
problem is resolved. We first introduce a few notations and definitions and then illustrate the identification 
problem with several examples. This is followed by the rules that may be used to find out whether an equation 
in a simultaneous-equation model is identified, that is, whether it is the relationship that we are actually 
estimating, be it the demand or supply function or something else. 


19.1 Notations and Definitions 


wv 


To facilitate our discussion, we introduce the following notations and definitions. 


The general M equations model in M endogenous, or jointly dependent, variables may be written as 
Eq. (19.1.1): 


Yı; = Bi2 Yor + Bi3 Yar + +++ + Bia Yor 

+ yd + 2X Hes ty XKe tu 
Yor = Bai Vir + Baa Y3: +++ + bm Ymr l 

+ Yor Xie + y2Xu +++ + yYkXk buy 
Yz: = B31 Yir + B32 Yo “aioe ere 


+ 31 Xp + Y32X +--+ + 3X Kt + uy 


Yur = Bm Yu + Bm Yr +--+ + Bum—-1Yu-it 
+ ymiXie +YmXu +: + MK X Kr +m (19.1.1) 
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where Y,, Y>, .... Yy=M endogenous, or jointly dependent, variables 
Xi X>,..., Xg= K predetermined variables (one of these X variables may take a value of unity to 
allow for the intercept term in each equation) 
Uj, Uy, . - . Uy = M stochastic disturbances 
t=1,2,..., T= total number of observations 


B’s = coefficients of the endogenous variables 
y’s = coefficients of the predetermined variables 
In passing, note that not each and every variable need appear in each equation. As a matter of fact, we see in 
Section 19.2 that this must not be the case if an equation is to be identified. 

As Eq. (19.1.1) shows, the variables entering a simultaneous-equation model are of two types: endog- 
enous, that is, those (whose values are) determined within the model; and predetermined, that is, those 
(whose values are) determined outside the model. The endogenous variables are regarded as stochastic, 
whereas the predetermined variables are treated as nonstochastic. 

The predetermined variables are divided into two categories: exogenous, current as well as lagged, and 
lagged endogenous. Thus, X,, is a current (present-time) exogenous variable, whereas X\,,_,, is a lagged 
exogenous variable, with a lag of one time period. Y,,_,) is a lagged endogenous variable with a lag of one 
time period, but since the value of Y,,,_;,is known at the current time ż, it is regarded as nonstochastic, hence, 
a predetermined variable.' In short, current exogenous, lagged exogenous, and lagged endogenous variables 
are deemed predetermined; their values are not determined by the model in the current time period. 

It is up to the model builder to specify which variables are endogenous and which are predetermined. 
Although (noneconomic) variables, such as temperature and rainfall, are clearly exogenous or predetermined, 
the model builder must exercise great care in classifying economic variables as endogenous or predeter- 
mined: He or she must defend the classification on a priori or theoretical grounds. However, later in the 
chapter we provide a statistical test of exogeneity. 

The equations appearing in (19.1.1) are known as the structural, or behavioral, equations because they 
may portray the structure (of an economic model) of an economy or the behavior of an economic agent 
(e.g., consumer or producer). The B’s and y’s are known as the structural parameters or coefficients. 

From the structural equations one can solve for the M endogenous variables and derive the reduced-form 
equations and the associated reduced-form coefficients. A reduced-form equation is one that expresses 
an endogenous variable solely in terms of the predetermined variables and the stochastic disturbances. 
To illustrate, consider the Keynesian model of income determination encountered in Chapter 18: 


Consumption function: C; = Bo + BiY;+u; 0 < p; <1 (18.2.3) 
Income identity: Yi,=C4+ 1 (18.2.4) 


In this model C (consumption) and Y (income) are the endogenous variables and / (investment expenditure) is 
treated as an exogenous variable. Both these equations are structural equations, Eq. (18.2.4) being an identity. 
As usual, the MPC B, is assumed to lie between 0 and 1. 

If Eq. (18.2.3) is substituted into Eq. (18.2.4), we obtain, after simple algebraic manipulation, 


Y; = Ho + Mi + W; (19.1.2) 


‘It is assumed implicitly here that the stochastic disturbances, the u’s, are serially uncorrelated. If this is not the case, Y, will 
be correlated with the current period disturbance term u,. Hence, we cannot treat it as predetermined. 
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where 
Bo 
wake BI 
fie (19.1.3) 
U 
Uur 
HSSE ßı 


Equation (19.1.2) is a reduced-form equation; it expresses the endogenous variable Y solely as a function 
of the exogenous (or predetermined) variable 7 and the stochastic disturbance term u. Ip and II, are the 
associated reduced-form coefficients. Notice that these reduced-form coefficients are nonlinear combina- 
tions of the structural coefficient(s). 

Substituting the value of Y from Eq. (19.1.2) into C of Eq. (18.2.3), we obtain another reduced-form 


} 


equation: l 


© = a T WwW. (19.1.4) 
where 
Bo By 
Nos 3= 
~- iia (19.1.5) 
Ur 
wW = 
' 1—By 


The reduced-form coefficients, such as T and T, are also known as impact, or short-run, multipliers, 
because they measure the immediate impact on the endogenous variable of a unit change in the value of the 
exogenous variable.” If in the preceding Keynesian model the investment expenditure is increased by, say, 
Re 1 and if the MPC is assumed to be 0.8, then from Eq. (19.1.3) we obtain II, = 5. This result means that 
increasing the investment by Re 1 will immediately (i.e., in the current time period) lead to an increase in 
income of Rs. 5, that is, a fivefold increase. Similarly, under the assumed conditions, Eq. (19.1.5) shows 
that II, = 4, meaning that Re 1 increase in investment expenditure will lead immediately to Rs. 4 increase in 
consumption expenditure. 

In the context of econometric models, equations such as Eq. (18.2.4) or Qf = Q; (quantity demanded 
equal to quantity supplied) are known as the equilibrium conditions. Identity (18.2.4) states that aggregate 
income Y must be equal to aggregate consumption (i.e., consumption expenditure plus investment expen- 
diture). When equilibrium is achieved, the endogenous variables assume their equilibrium values. 

Notice an interesting feature of the reduced-form equations. Since only the predetermined variables and 
stochastic disturbances appear on the right sides of these equations, and since the predetermined variables 
are assumed to be uncorrelated with the disturbance terms, the OLS method can be applied to estimate 
the coefficients of the reduced-form equations (the II’s). From the estimated reduced-form coefficients one 
may estimate the structural coefficients (the B’s), as shown later. This procedure is known as indirect least 
squares (ILS), and the estimated structural coefficients are called ILS estimates. 


?In econometric models the exogenous variables play a crucial role. Very often, such variables are under the direct control 
of the government. Examples are the rate of personal and corporate taxes, subsidies, unemployment compensation, etc. 


3For details, see Jan Kmenta, Elements of Econometrics, 2d ed., Macmillan, New York, 1986, pp. 723-731. 
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We shall study the ILS method in greater detail in Chapter 20. In the meantime, note that since the 
reduced-form coefficients can be estimated by the OLS method, and since these coefficients are combinations 
of the structural coefficients, the possibility exists that the structural coefficients can be “retrieved” from the 
reduced-form coefficients, and it is in the estimation of the structural parameters that we may be ultimately 
interested. How does one retrieve the structural coefficients from the reduced-form coefficients? The answer 
is given in Section 19.2, an answer that brings out the crux of the identification problem. 


19.2 The Identification Problem 


By the identification problem we mean whether numerical estimates of the parameters of a structural 
equation can be obtained from the estimated reduced-form coefficients. If this can be done, we say that the 
particular equation is identified. If this cannot be done, then we say that the equation under consideration is 
unidentified, or underidentified. 

An identified equation may be either exactly (or fully or just) identified or overidentified. It is said to be 
exactly identified if unique numerical values of the structural parameters can be obtained. It is said to be 
overidentified if more than one numerical value can be obtained for some of the parameters of the struc- 
tural equations. The circumstances under which each of these cases occurs will be shown in the following 
discussion. 

The identification problem arises because different sets of structural coefficients may be compatible with 
the same set of data. To put the matter differently, a given reduced-form equation may be compatible with 
different structural equations or different hypotheses (models), and it may be difficult to tell which particular 
hypothesis (model) we are investigating. In the remainder of this section we consider several examples to 
show the nature of the identification problem. 


Underidentification 


Consider once again the demand-and-supply model (18.2.1) and (18.2.2), together with the market-clearing, 
or equilibrium, condition that demand is equal to supply. By the equilibrium condition, we obtain 


æo +P, + uir = Bo + Bi P; + ure (19.2.1) 
Solving Eq. (19.2.1), we obtain the equilibrium price 
P, = Io + v: (19.2.2) 
where 
ye (19.2.3) 
a, — By 
jee (19.2.4) 
æ SB 
Substituting P, from Eq. (19.2.2) into Eq. (18.2.1) or (18.2.2), we obtain the following equilibrium quantity: 
Q: =M +w (19.2.5) 
where 
y= a Bo — of (19.2.6) 


ay — By 
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= Xuz — Piti . (19.2.7) 
OT pi 
Incidentally, note that the error terms v, and w, are linear combinations of the original error terms u; and u3. 

Equations (19.2.2) and (19.2.5) are reduced-form equations. Now our demand-and-supply model contains 
four structural coefficients a, a, By, and B,, but there is no unique way of estimating them. Why? The 
answer lies in the two reduced-form coefficients given in Eqs. (19.2.3) and (19.2.6). These reduced-form 
coefficients contain all four structural parameters, but there is no way in which the four structural unknowns 
can be estimated from only two reduced-form coefficients. Recall from high school algebra that to estimate 
four unknowns we must have four (independent) equations, and, in general, to estimate k unknowns we must 
have k (independent) equations. Incidentally, if we run the reduced-form regression (19.2.2) and (19.2.5), we 
will see that there are no explanatory variables, only the constants, and these constants will simply give the 
mean values of P and Q (why?). 

What all this means is that, given time series data on P (price) and Q (quantity) and no other infor- 
mation, there is no way the researcher can guarantee whether he or she is estimating the demand function or 
the supply function. That is, a given P, and Q, represent simply the point of intersection of the appropriate 
demand-and-supply curves because of the equilibrium condition that demand is equal to supply. To see this 
clearly, consider the scattergram shown in Figure 19.1. 

Figure 19.la gives a few scatterpoints relating Q to P. Each scatterpoint represents the intersection of a 
demand and a supply curve, as shown in Figure 19.1b. Now consider a single point, such as that shown in 
Figure 19.1c. There is no way we can be sure which demand and-supply curve of a whole family of curves 
shown in that panel generated that point. Clearly, some additional information about the nature of the demand- 
and-supply curves is needed. For example, if the demand curve shifts over time because of change in income, 
tastes, etc., but the supply curve remains relatively stable, as in Figure 19.1d, the scatter-points trace out a 
supply curve. In this situation, we say that the supply curve is identified. By the same token, if the supply 
curve shifts over time because of changes in weather conditions (in the case of agricultural commodities) or 
other extraneous factors but the demand curve remains relatively stable, as in Figure 19.le, the scatterpoints 
trace out a demand curve. In this case, we say that the demand curve is identified. 

There is an alternative and perhaps more illuminating way of looking at the identification problem. Suppose 
we multiply Eq. (18.2.1) by A (0 = A = 1) and Eq. (18.2.2) by 1 — A to obtain the following equations 
(Note: we drop the superscripts on Q): 


AQ, = Ay + Aa P, + Atty, l iy `- (19.2.8) 
(1 —1)Q, =(1 —A)Bo + 1 — A) Bi P, + (1 — A)uy (19.2.9) 


Adding these two equations gives the following linear combination of the original demand-and-supply 
equations: 


Q: = Y + yP +w (19.2.10) 
where 
Yo = Aay + (1 — A) Bo 
yı = Aa) +(1—A)p; (19.2.11) 
Ww; = Auy, + (1 — A)ur, 


The “bogus,” or “mongrel,” equation (19.2.10) is observationally indistinguishable from either Eq. 
(18.2.1) or Eq. (18.2.2) because they involve the regression of Q and P. Therefore, if we have time series data 
on P and Q only, any of Eqs. (18.2.1), (18.2.2), or (19.2.10) may be compatible with the same data. In other 
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Figure 19.1 Hypothetical supply-and-demand functions and the identification problem. 


words, the same data may be compatible with the “hypothesis” Eqs. (18.2.1), (18.2.2), or (19.2.10), and there 
is no way we can tell which one of these hypotheses we are testing. 

For an equation to be identified, that is, for its parameters to be estimated, it must be shown that the given 
set of data will not produce a structural equation that looks similar in appearance to the one in which we are 
interested. If we set out to estimate the demand function, we must show that the given data are not consistent 
with the supply function or some mongrel equation. 


Just, or Exact, Identification 


The reason we could not identify the preceding demand function or the supply function was that the same 
variables P and Q are present in both functions and there is no additional information, such as that indicated 
in Figure 19.1d or e. But suppose we consider the following demand-and-supply model: 


Demand function: Q, =a) + œP, +02], + u1 a, <0,a.>0 (19.2.12) 
Supply function: O, = Bo + bı Pı + uz B, > 0 (19.2.13) 

where / = income of the consumer, an exogenous variable, and all other variables are as defined previously. 
Notice that the only difference between the preceding model and our original demandand-supply model is 


that there is an additional variable in the demand function, namely, income. From economic theory of demand 
we know that income is usually an important determinant of demand for most goods and services. Therefore, 
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its inclusion in the demand function will give us some additional information about consumer behavior. For 
most commodities income is expected to have a positive effect on consumption (œ, > 0). 
Using the market-clearing mechanism, quantity demanded = quantity supplied, we have 


Ay + æ; P, + aol, + uy, = Bo + Bi P; + ux (19.2.14) 
Solving Eq. (19.2.14) provides the following equilibrium value ORE: 
B = Io SF Mhi +v: (19.2.15) 
where the reduced-form coefficients are 
Ty = Bo — œo 
a, — By 
(19.2.16) 
a2 
Il; = — 
œi — 2 
and 
— Uz — Uir 
"ay — By 


Substituting the equilibrium value of P, into the preceding demand or supply function, we obtain the following 
equilibrium quantity: 


Q, = I2 + Ibi, +w: (19.2.17) 
where 
T = at; Bo — &ofı 
a; — Bi 
ee a2 By (19.2.18) 
,=-— 
CA = f 
and 


_ Ux — Piti 
i a — By 

Since Eqs. (19.2.15) and (19.2.17) are both reduced-form equations, the ordinary least squares (OLS) 
method can be applied to estimate their parameters. Now the demand-and-supply model (19.2.12) and 
(19.2.13) contains five structural coefficients—ap, a, @, By and B,. But there are only four equations to 
estimate them, namely, the four reduced-form coefficients Io, H}, IL, and I, given in Eqs. (19.2.16) and 
(19.2.18). Hence, unique solution of all the structural coefficients is not possible. But it can be readily shown 
that the parameters of the supply function can be identified (estimated) because 


Bo = Tz — Bi Mo 
ee ri (19.2.19) 


But there is no unique way of estimating the parameters of the demand function; therefore, it remains 
underidentified. Incidentally, note that the structural coefficient B, is a nonlinear function of the reduced-form 
coefficients, which poses some problems when it comes to estimating the standard error of the estimated B,, 
as we shall see in Chapter 20. 
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To verify that the demand function (19.2.12) cannot be identified (estimated), let us multiply it by 
A(O = A = 1) and (19.2.13) by 1 — A and add them up to obtain the following “mongrel” equation: 


Q: = Y + yP, + nh + wi (19.2.20) 
where 
Yo = Aa + (1 — A) Bo 
yı = Aa, +(1 —A)By (19.2.21) 
y2 = raz 
and 


We = Àir + (1 — Adu, 


Equation (19.2.20) is observationally indistinguishable from the demand function (19.2.12) although it is 
distinguishable from the supply function (19.2.13), which does not contain the variable / as an explanatory 
variable. Hence, the demand function remains unidentified. 

Notice an interesting fact: It is the presence of an additional variable in the demand function that 
enables us to identify the supply function! Why? The inclusion of the income variable in the demand 
equation provides us some additional information about the variability of the function, as indicated in Figure 
19.1d. The figure shows how the intersection of the stable supply curve with the shifting demand curve (on 
account of changes in income) enables us to trace (identify) the supply curve. As will be shown shortly, very 
often the identifiability of an equation depends on whether it excludes one or more variables that are included 
in other equations in the model. 

But suppose we consider the following demand-and-supply model: 


Demand function: Q, = do +a, P; + a2], + uy a; <0,a.>0 (19.2.12) 
Supply function: Q, = Bo + BiP, + BoPr-1 + uy B; > 0, B2 > 0 (19.2.22) 
where the demand function remains as before but the supply function includes an additional explanatory 
variable, price lagged one period. The supply function postulates that the quantity of a commodity supplied 
depends on its current and previous period’s price, a model often used to explain the supply of many agricul- 
tural commodities. Note that P,_, is a predetermined variable because its value is known at time t. 
By the market-clearing mechanism we have 


æo + a P, + a2]; + uy, = Bo + Bi Py + BoPr-1 + ux (19.2.23) 
Solving this equation, we obtain the following equilibrium price: 
P, = Mo + Mil + 2P- + ve (19.2.24) 
where 
Bo — a 
Io = ———— 
©” on — Bi 
a2 
MSs 
! a= pi 
(19.2.25) 
owe 
2" 1 — Bi 
Ure — Uir 
Vt 


~ a — Bi 
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Substituting the equilibrium price into the demand or supply equation, we obtain the corresponding 
equilibrium quantity: 


O, = Ts + Mg++ Ts Fei: (19.2.26) 
where the reduced-form coefficients are 
T; = at Bo — Afi 
œi — Pı 
a2 By 
E T 19.2.27 
å a; — By ( ) 
i= TSM 
œi ~ ĝi 
and 
QU — iuir 
WwW = 1 


a — Bi 

The demand-and-supply model given in Eqs. (19.2.12) and (19.2.22) contains six structural coefficients— 
Qo &), @>, Bo, B,, and B.—and there are six reduced-form coefficients—ITp, IH. M. I. M, and H.—to 
estimate them. Thus, we have six equations in six unknowns, and normally we should be able to obtain 
unique estimates. Therefore, the parameters of both the demand-and-supply equations can be identified. and 
the system as a whole can be identified. (In Exercise 19.2 the reader is asked to express the six structural 
coefficients in terms of the six reduced-form coefficients given previously to show that unique estimation of 
the model is possible.) l l 

To check that the preceding demand-and-supply functions are identified, we can also resort to the device of 
multiplying the demand equation (19.2.12) by A (0 = A = 1) and the supply equation (19.2.22) by 1 — A and 
add them to obtain a mongrel equation. This mongrel equation will contain both the predetermined variables 
I, and P,_;; hence, it will be observationally different from the demand as well as the supply equation because 
the former does not contain P,_, and the latter does not contain /,. 


Overidentification : Y 


For certain goods and services, income as well as wealth of the consumer is an important determinant of 
demand. Therefore, let us modify the demand function (19.2.12) as follows, keeping the supply function as 
before: 


Supply function: Q, = Po + PiP, + P2Pr-i +u fi >0,f.>0 (19.2.28) 
Supply function: Q: = Po + PiP; + PoP: + uz . (19.2.22 
where in addition to the variables already defined, R represents wealth; for most goods and services, wealth, 
like income, is expected to have a positive effect on consumption. 
Equating demand to supply, we obtain the following equilibrium price and quantity: 
P, = To + Mii + TI,R, + TI; Py} + Vi (19.2.29) 
Q: = I4 + Ms; + M6R, + 7P +w: i (19.2.30) 
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where 
ny = £222 N A 
a; — By On — Bi 
a 
Tl, = a oe TI; z= B 
a) — By ay Pi 
Q — 
I4 = 1 Bo — Aof1 Tl; = ofi (19.2.31) 
ai — B, eS pie 
m piama 
æ — fı ay — ßı 
me Our, — Pity Pe Uat — thy 
a — Pj ay — By 


The preceding demand-and-supply model contains seven structural coefficients, but there are eight 
equations to estimate them—the eight reduced-form coefficients given in Eq. (19.2.31); that is, the number 
of equations is greater than the number of unknowns. As a result, unique estimation of all the parameters of 
our model is not possible, which can be shown easily. From the preceding reduced-form coefficients, we can 
obtain 


Ig 
ßı = Th | (19.2.32) 
or 
pls 
ĝi = T, (19.2.33) 


that is, there are two estimates of the price coefficient in the supply function, and there is no guarantee that 
these two values or solutions will be identical.* Moreover, since B | appears in the denominators of all the 
reduced-form coefficients, the ambiguity in the estimation of B, will be transmitted to other estimates too. 

Why was the supply function identified in the system (19.2.12) and (19.2.22) but not in the system (19.2.28) 
and (19.2.22), although in both cases the supply function remains the same? The answer is that we have “too 
much,” or an oversufficiency of information, to identify the supply curve. This situation is the opposite of 
the case of underidentification, where there is too little information. The oversufficiency of the information 
results from the fact that in the model (19.2.12) and (19.2.22) the exclusion of the income variable from 
the supply function was enough to identify it, but in the model (19.2.28) and (19.2.22) the supply function 
excludes not only the income variable but also the wealth variable. In other words, in the latter model we put 
“too many” restrictions on the supply function by requiring it to exclude more variables than necessary to 
identify it. However, this situation does not imply that overidentification is necessarily bad because we shall 
see in Chapter 20 how we can handle the problem of too much information, or too many restrictions. 

We have now exhausted all the cases. As the preceding discussion shows, an equation in a simultaneous- 
equation model may be underidentified or identified (either over- or just). The model as a whole is identified 
if each equation in it is identified. To secure identification, we resort to the reduced-form equations. But in 
Section 19.3, we consider an alternative and perhaps less time-consuming method of determining whether or 
not an equation in a simultaneous-equation model is identified. 


4Notice the difference between under- and overidentification. In the former case, it is impossible to obtain estimates of 
the structural parameters, whereas in the latter case, there may be several estimates of one or more structural coefficients. 
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19.3 Rules for Identification 


As the examples in Section 19.2 show, in principle it is possible to resort to the reduced- form equations 
to determine the identification of an equation in a system of simultaneous equations. But these examples 
also show how time-consuming and laborious the process can be. Fortunately, it is not essential to use this 
procedure. The so-called order and rank conditions of identification lighten the task by providing a 
systematic routine. 

To understand the order and rank conditions, we introduce the following notations: 


M = number of endogenous variables in the model 

m = number of endogenous variables in a given equation 

K = number of predetermined variables in the model including the intercept 
k = number of predetermined variables in a given equation 


The Order Condition of Identifiability” 


A necessary (but not sufficient) condition of identification, known as the order condition, may be stated in 
two different but equivalent ways as follows (the necessary as well as sufficient condition of identification 
will be presented shortly): 


Definition 19.1 


In a model of M simultaneous equations, in order for an equation to be identified, it must exclude at least 
M -1 variables (endogenous as well as predetermined) appearing in the model. If it excludes exactly M — 1 
variables, the equation is just identified. If it excludes more than M —1 variables, it is overidentified. 


Definition 19.2 


In a model of M simultaneous equations, in order for an equation to be identified, the number of prede- 
termined variables excluded from the equation must not be less than the number of endogenous variables 
included in that equation less 1, that is, 

K—k>m-1 = (19.3.1) 
if K- k=m-—1, the equation is just identified, but if K- k > m - 1, it is overidentified. z 


In Exercise 19.1 the reader is asked to prove that the preceding two definitions of identification are equiv- 
alent. 
To illustrate the order condition, let us revert to our previous examples. 


Example 19.1 


Demand function: Qf = ao + œP; + ut (18.2.1) 


Supply function: Qi = Bo + BiP: + Uzt (18.2.2) 
This model has two endogenous variables P and Q and no predetermined variables. To be identified, each 
of these equations must exclude at least M - 1 = 1 variable. Since this is not the case, neither equation is 
identified. 


IIIe 


‘The term order refers to the order of a matrix, that is, the number of rows and columns present in a matrix. See 
Appendix B. 
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Example 19.2 


Demand function: Q} = œ+ aP; + ark ture (19.2.12) 


Supply function: Qi = Bot B1Pi + Uze (19.2.13) 
in this model Q and P are endogenous and | is exogenous. Applying the order condition given in Eq. (19.3.1), 
we see that the demand function is unidentified. On the other hand, the supply function is just identified 
because it excludes exactly M — 1 = 1 variable |, 


Example 19.3 


Demand function: QJ = ao + Pt + a2 + unt (19.2.12) 
Supply function: Q} = Bot BiPt + B2Pi-1 + Uzt (19.2.22) 


Given that P, and Q, are endogenous and /, and P,_, are predetermined, Eq. (19.2.12) excludes exactly one 
variable P,_, and Eq. (19.2.22) also excludes exactly one variable /,. Hence each equation is identified by the 
order condition. Therefore, the model as a whole is identified. 


Example 19.4 


Demand function: Q? = Ap + a7 Py + aah + 3k + Urt (19.2.28) 
Supply function: Qt = Bot BiPt + BaPt-1 + Uzt (19.2.22) 


In this model P,and Q, are endogenous and |,, R, and P,_, are predetermined. The demand function excludes 
exactly one variable P,_, and hence by the order condition it is exactly identified. But the supply function 
excludes two variables /,and R, and hence it is overidentified. As noted before, in this case there are two ways 
of estimating B,, the coefficient of the price variable. 

Notice a slight complication here. By the order condition the demand function is identified. But if we try 
to estimate the parameters of this equation from the reduced-form coefficients given in Eq. (19.2.31), the 
estimates will not be unique because B,, which enters into the computations, takes two values and we shall 
have to decide which of these values is appropriate. But this complication can be obviated because it is shown 
in Chapter 20 that in cases of overidentification the method of indirect least squares is not appropriate and 
should be discarded in favor of other methods. One such method is two-stage least squares, which we 
shall discuss fully in Chapter 20. 


As the previous examples show, identification of an equation in a model of simultaneous equations 
is possible if that equation excludes one or more variables that are present elsewhere in the model. This 
situation is known as the exclusion (of variables) criterion, or the zero restrictions criterion (the coefficients 
of variables not appearing in an equation are assumed to have zero values). This criterion is by far the most 
commonly used method of securing or determining identification of an equation. But notice that the zero 
restrictions criterion is based on a priori or theoretical expectations that certain variables do not appear in a 
given equation. It is up to the researcher to spell out clearly why he or she does expect certain variables to 
appear in some equations and not in others. 
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The Rank Condition of Identifiability® 


The order condition discussed previously is a necessary but not sufficient condition for identification; that 
is, even if it is satisfied, it may happen that an equation is not identified. Thus, in Example 19.2, the supply 
equation was identified by the order condition because it excluded the income variable /,, which appeared 
in the demand function. But identification is accomplished only if a5, the coefficient of 7, in the demand 
function, is not zero, that is, if the income variable not only probably but actually does enter the demand 
function. l 

More generally, even if the order condition K — k = m — 1 is satisfied by an equation, it may be uniden- 
tified because the predetermined variables excluded from this equation but present in the model may not all 
be independent so that there may not be one-to-one correspondence between the structural coefficients (the 
B’s) and the reduced-form coefficients (the II’s). That is, we may not be able to estimate the structural param- 
eters from the reduced- form coefficients, as we shall show shortly. Therefore, we need both a necessary and 
sufficient condition for identification. This is provided by the rank condition of identification, which may be 
stated as follows: 


Rank Condition of Identification In a model containing M equations in M endogenous variables, an 
equation is identified if and only if at least one nonzero determinant of order (M — 1 )(M-— 1) can be constructed 
from the coefficients of the variables (both endogenous and predetermined) excluded from that particular 
equation but included in the other equations of the model. 


As an illustration of the rank condition of identification, consider the following hypothetical system of 
simultaneous equations in which the Y variables are endogenous and the X variables are predetermined.’ 


Yis — Bio A — Bis A i =ü (19.3.2) 
Yo, — Bro — Bos Yar — Yor Xir — ¥22X2¢ = ux (19.3.3) 
Y3, — B39 — Bai Vir — V31 Xir — y32X, = uy (19.3.4) 
Ya: — Bao — Bar Yir — Baz Yne r —y43X3t = U4; (19.3.5) 


To facilitate identification, let us write the preceding system in Table 19.1, which is seJf- explanatory. 

Let us first apply the order condition of identification, as shown in Table 19.2. By the order condition each 
equation is identified. Let us recheck with the rank condition. Consider the first equation, which excludes 
variables Y4, Xj, and X; (this is represented by zeros in the first row of Table 19.1). For this equation to 
be identified, we must obtain at least one nonzero determinant of order 3 X 3 from the coefficients of the 
variables excluded from this equation but included in other equations. To obtain the determinant we first 
obtain the relevant matrix of coefficients of variables Y}, X, and X, included in the other equations. In the 
present case there is only one such matrix, call it A, defined as follows: 


6The term rank refers to the rank of a matrix and is given by the largest-order square matrix (contained in the given matrix) 
whose determinant is nonzero. Alternatively, the rank of a matrix is the largest number of linearly independent rows or 
columns of that matrix. See Appendix B. 


7The simultaneous-equation system presented in Eq. (19.1.1) may be shown in the following alternative form, which may 
be convenient for matrix manipulations. 
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Table 19.1 
Coefficients of the Variables __ 

Equation No. 1 Yı Y2 Y3 Y4 Xı X2 X3 
(19.3.2) — bio 1 —=bi2 ss — fis 0 SAN 0 0 
(19.3.3) —ß20 0m 1 — B23 0 —y21 Bad 0 
(19.3.4) —B30 = — Ba 0 1 0 —y31 = 32 0 
(923:5) —Bao -pa —Ba2 0 1 0 0 —743 

Table 19.2 
No. of Predetermined No. of Endogenous 
Variables Excluded, Variables Included, 

Equation No. (K — k) Less One, (m — 1) Identified? 
(19.3.2) 2 2 Exactly 
(19.3.3) 1 1 E Exactly 
(19.3.4) 1 1 Exactly 
(19.3.5) 2 2 Exactly 

Oe a0 
A=|0 —-y» 0 (19.3.6) 
EO E y 
It can be seen that the determinant of this matrix is zero: 
0 -ym 0 
dA Ome 0 (19.3.7) 
MNT A 


Since the determinant is zero, the rank of the matrix (19.3.6), denoted by p(A), is less than 3. Therefore, Eq. 
(19.3.2) does not satisfy the rank condition and hence is not identified. 

As noted, the rank condition is both a necessary and sufficient condition for identification. Therefore, 
although the order condition shows that Eq. (19.3.2) is identified, the rank condition shows that it is not. 
Apparently, the columns or rows of the matrix A given in Eq. (19.3.6) are not (linearly) independent, meaning 
that there is some relationship between the variables Y,, X,, and X,. As a result, we may not have enough 
information to estimate the parameters of equation (19.3.2); the reduced-form equations for the preceding 
model will show that it is not possible to obtain the structural coefficients of that equation from the reduced- 
form coefficients. The reader should verify that by the rank condition Egs. (19.3.3) and (19.3.4) are also 
unidentified but Eq. (19.3.5) is identified. 

As the preceding discussion shows, the rank condition tells us whether the equation under consideration 
is identified or not, whereas the order condition tells us if it is exactly identified or overidentified. 

To apply the rank condition one may proceed as follows: 


1. Write down the system in a tabular form, as shown in Table 19.1. 
2. Strike out the coefficients of the row in which the equation under consideration appears. 
3. Also strike out the columns corresponding to those coefficients in step (2) which are nonzero. 
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4. The entries left in the table will then give only the coefficients of the variables included in the system 
but not in the equation under consideration. From these entries form all possible matrices, like A, of 
order M — | and obtain the corresponding determinants. If at least one nonvanishing or nonzero deter- 
minant can be found, the equation in question is (just or over-) identified. The rank of the matrix, say, 
A, in this case is exactly equal to M — 1. If all the possible (M — 1)(M — 1) determinants are zero, the 
rank of the matrix A is less than M — 1 and the equation under investigation is not identified. 


Our discussion of the order and rank conditions of identification leads to the following general principles 
of identifiability of a structural equation in a system of M simultaneous equations: 


. If K--k >m-—1 and the rank of the A matrix is M — 1, the equation is overidentified. 

If K- k= m—1 and the rank of the matrix A is M — 1, the equation is exactly identified. 

If K— k = m-1 and the rank of the matrix A is less than M — 1, the equation is underidentified. 

. If K-k <m-—1, the structural equation is unidentified. The rank of the A matrix in this case is bound to 
be less than M — 1. (Why?) 


AWN > 


Henceforth, when we talk about identification we mean exact identification or overidentification. There is 
no point in considering unidentified, or underidentified, equations because no matter how extensive the data, 
the structural parameters cannot be estimated. Besides, most simultaneous-equation systems in economics 
and finance are overidentified rather than under identified, so we need not worry too much about underiden- 
tification. However, as shown in Chapter 20, parameters of overidentified as well as just identified equations 
can be estimated. 

Which condition should one use in practice: Order or rank? For large simultaneous-equation models, 
applying the rank condition is a formidable task. Therefore, as Harvey notes, 


Fortunately, the order condition is usually sufficient to ensure identifiability, and although it is important to be 
aware of the rank condition, a failure to verify it will rarely result in disaster. 


"19.4 A Test of Simultaneity’ 


If there is no simultaneous equation, or simultaneity problem, the OLS estimators produce consistent and 
efficient estimators. On the other hand, if there is simultaneity, OLS estimators are not even consistent. In the 
presence of simultaneity, as we will show in Chapter 20, the methods of two-stage least squares (2SLS) and 
instrumental variables (IV) will give estimators that are consistent and efficient. Oddly. if we apply these 
alternative methods when there is in fact no simultaneity, these methods yield estimators that are consistent 
but not efficient (i.e., with smaller variance). This discussion suggests that we should check for the simulta- 
neity problem before we discard OLS in favor of the alternatives. 

As we showed earlier, the simultaneity problem arises because some of the regressors are endogenous and 
are therefore likely to be correlated with the disturbance, or error, term. Therefore, a test of simultaneity is 
essentially a test of whether (an endogenous) regressor is correlated with the error term. If it is, the simulta- 
neity problem exists, in which case alternatives to OLS must be found; if it is not, we can use OLS. To find 
out which is the case in a concrete situation, we can use Hausman’s specification error test. 


*Optional. 
8Andrew Harvey, The Econometric Analysis of Time Series, 2d ed., The MIT Press, Cambridge, Mass., 1990, p: 328! 


The following discussion draws from Robert S. Pindyck and Daniel L. Rubinfeld, Econometric Models and Economic Forecasts, 
3d ed., McGraw-Hill, New York, 1991, pp. 303-305. : 
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Hausman Specification Test 
A version of the Hausman specification error test that can be used for testing the simultaneity problem can 


be explained as follows:!° 
To fix ideas, consider the following two-equation model: 


Demand function: Of =a + oP, +021, +03R, + uit (19.4.1) 
Supply function: O; = Bo + BiP, + ux (19.4.2) 
where P = price 
Q = quantity 
I = income 
R = wealth u 


u’s = error terms 

Assume that / and R are exogenous. Of course, P and Q are endogenous. 

Now consider the supply function (19.4.2). If there is no simultaneity problem (i.e., P and Q are mutually 
independent), P, and u, should be uncorrelated (why?). On the other hand, if there is simultaneity, P, and us, 
will be correlated. To find out which is the case, the Hausman test proceeds as follows: 

First, from Eqs. (19.4.1) and (19.4.2) we obtain the following reduced-form equations: 


P, = Io +1, +R: +v (19.4.3) 
Q: = I; + Mah + sR, +w (19.4.4) 
where v and w are the reduced-form error terms. Estimating Eq. (19.4.3) by OLS we obtain 
Ê, = fio + fii + ÔR; (19.4.5) 
Therefore, 
P, =Ê +ô, (19.4.6) 
where Ê, are estimated P, and Y, are the estimated residuals. Now consider the following equation: 
QO; = Bo + Ai P, + Bid, + ux (19.4.7) 


Note: The coefficients of P, and v, are the same. The difference between this equation and the original supply 
equation is that it includes the additional variable ¥,, the residual from regression (19.4.3). 

Now, if the null hypothesis is that there is no simultaneity, that is, P, is not an endogenous variable, the 
correlation between », and u, should be zero, asymptotically. Thus, if we run the regression (19.4.7) and 
find that the coefficient of v, in Eq. (19.4.7) is statistically zero, we can conclude that there is no simultaneity 
problem. Of course, this conclusion will be reversed if we find this coefficient to be statistically significant. 
In passing, note that Hausman’s simultaneity test is also known as the Hausman test of endogeneity: In the 
present example we want to find out if P, is endogenous. If it is, we have the simultaneity problem. 

Essentially, then, the Hausman test involves the following steps: 


10). A. Hausman, “Specification Tests in Econometrics,” Econometrica, vol. 46, November 1976, pp. 1251-1271. See also A. 
Nakamura and M. Nakamura, “On the Relationship among Several Specification Error Tests Presented by Durbin, Wu, and 
Hausman,” Econometrica, vol. 49, November 1981, pp. 1583-1588. 
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Step 1. Regress P, on I, and R, to obtain ¥,. 

Step 2. Regress Q, on Ê, and %, and perform a ż test on the coefficient of %,. If it is significant, do not 
reject the hypothesis of simultaneity; otherwise, reject it.'! For efficient estimation, however, Pindyck and 
Rubinfeld suggest regressing Q, on P, and ¥,.'” 

There are alternative ways to apply the Hausman test, which are given by way of an exercise. 


Example 19.5 Pindyck—Rubinfeld Model of Public Spending”? 


To study the behavior of U.S. state and local government expenditure, the authors developed the following 
simultaneous-equation model: 


EXP = fı + B2AID + B3INC + B4POP + u; (19.4.8) 


AID = 5; + 52EXP + 53PS + v; (19.4.9) 

where EXP = state and local government public expenditures 

AID = level of federal grants-in-aid 

INC = income of states 

POP = state population 

PS = population of primary and secondary school children 
u and v = error terms 

In this model, INC, POP, and PS are regarded as exogenous. 

Because of the possibility of simultaneity between EXP and AID, the authors first regress AID on INC, POP, 
and PS (i.e., the reduced-form regression). Let the error term in this regression be w;. From this regression the 
calculated residual is w;. The authors then regress EXP on AID, INC, POP, and w;, to obtain the following 
results: 


EXP = -89.41 + 4.50AID+ 0.00013INC— 0.518POP—  1.39w; 
t= (—1.04) (5.89) (3.06) (—4.63) C1733 (19.4.10)"4 
R2 =0.99 


At the 5 percent level of significance, the coefficient of w; is not statistically significant, and therefore, at 

this level, there is no simultaneity problem. However, at the 10 percent level of significance, it is statistically 

significant, raising the possibility that the simultaneity problem is present. 
Incidentally, the OLS estimation of Eq. (19.4.8) is as follows: 


EXP = —46.81 + 3.24AID+ 0.00019INC— 0.597POP 
t= (—0.56) (13.64) (8.12) Es m= (19.4.11) 
R? = 0.993 


Notice an interesting feature of the results given in Eqs. (19.4.10) and (19.4.11): When simultaneity is explicitly 
taken into account, the AID variable is less significant although numerically it is greater in magnitude. 


Tif more than one endogenous regressor is involved, we will have to use the F test. 
12Pindyck and Rubinfeld, op. cit., p. 304. Note: The regressor is P,and not Pe. 
13Pindyck and Rubinfeld, op. cit., pp. 176-177. Notations slightly altered. 

144s in footnote 12, the authors use AID rather than AID as the regressor. 
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"19.5 Tests for Exogeneity 


We noted earlier that it is the researcher’s responsibility to specify which variables are endogenous and which 
are exogenous. This will depend on the problem at hand and the a priori information the researcher has. But 
is it possible to develop a statistical test of exogeneity, in the manner of Granger’s causality test? 

The Hausman test discussed in Section 19.4 can be utilized to answer this question. Suppose we have a 
three-equation model in three endogenous variables, Y,, Y}, and Y}, and suppose there are three exogenous 
variables, X}, X,, and X,. Further, suppose that the first equation of the model is 


Yi; = Bo + Bo Ya; + B3¥3; +0X4; + uii (19.5.1) 


If Y, and Y; are truly endogenous, we cannot estimate Eq. (19.5.1) by OLS (why?). But how do we find that 
out? We can proceed as follows. We obtain the reduced-form equations for Y, and Y, (Note: the reduced- 
form equations will have only predetermined variables on the right-hand side). From these reduced-form 
equations, we obtain Y2; and Y3,, the predicted values of Y,; and Y}; respectively. Then in the spirit of the 
Hausman test discussed earlier, we can estimate the following equation by OLS: 


Yu: = Bo + BoYoi + Bs Ysi + 0X1; + A2Voi + Azzi + uy (19.5.2) 
Using the F test, we test the hypothesis that A, = A, = 0. If this hypothesis is rejected, Y, and Y, can be deemed 


endogenous, but if it is not rejected, they can be treated as exogenous. For a concrete example, see Exercise 
19.16. 


Summary and Conclusions 


The problem of identification precedes the problem of estimation. 

. The identification problem asks whether one can obtain unique numerical estimates of the structural 

coefficients from the estimated reduced-form coefficients. 

3. If this can be done, an equation in a system of simultaneous equations is identified. If this cannot be 
done, that equation is un- or under-identified. 

4. An identified equation can be just identified or overidentified. In the former case, unique values of 
structural coefficients can be obtained; in the latter, there may be more than one value for one or more 
structural parameters. 

5. The identification problem arises because the same set of data may be compatible with different sets of 
structural coefficients, that is, different models. Thus, in the regression of price on quantity only, it is 
difficult to tell whether one is estimating the supply function or the demand function, because price and 
quantity enter both equations. 

6. To assess the identifiability of a structural equation, one may apply the technique of reduced-form 
equations, which expresses an endogenous variable solely as a function of predetermined variables. 

7. However, this time-consuming procedure can be avoided by resorting to either the order condition or 

the rank condition of identification. Although the order condition is easy to apply, it provides only a 

necessary condition for identification. On the other hand, the rank condition is both a necessary and 

sufficient condition for identification. If the rank condition is satisfied, the order condition is satisfied, 
too, although the converse is not true. In practice, though, the order condition is generally adequate to 
ensure identifiability. 


pot i= 


*Optional. 
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In the presence of simultaneity, OLS is generally not applicable, as was shown in Chapter 18. But if 
one wants to use it nonetheless, it is imperative to test for simultaneity explicitly. The Hausman speci- 
fication test can be used for this purpose. 

Although in practice deciding whether a variable is endogenous or exogenous is a matter of judgment, 
one can use the Hausman specification test to determine whether a variable or group of variables is 
endogenous or exogenous. 

Although they are in the same family, the concepts of causality and exogeneity are different and one 
may not necessarily imply the other. In practice it is better to keep those concepts separate (see Section 
17.14). 


Multiple Choice Questions 


. In simultaneous equation model; the endogenous variables are 


a. Determined outside the model 

b. Determined within the model 

c. Non-stochastic variables 

d. Variables whose values are predetermined 


. In SEM, the predetermined variables are 


a. Non-stochastic variables determined exogenously 
b. Exogenous lagged variables 
c. Lagged endogenous variables 
d. All of the above 
In which of the following equations, each endogenous variable is expressed as a function of predeter- 
mined variable and random error term? 
a. Structural equation 
b. Linear equation 
c. Reduced form equation 
d. Simultaneous equation 
The short-run or impact multiplier are the 
a. Structural equation coefficient 
b. Reduced form coefficient 
c. Linear model 
d. First differenced SEM 


. In SEM, the OLS is applied to estimate the coefficients of the 


a. Structural equation 

b. Linear equation 

c. Reduced form equation 

d. Simultaneous equation 

The indirect least square is applied to estimate the coefficients of the 

a. Structural equation 

b. Linear equation 

c. Reduced form equation 

d. Simultaneous equation 


10. 
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An equation can be identified in SEM 
a. If the structural equation parameter can be obtained from the reduced form estimates 
b. If reduced form coefficient can be obtained from their structural equation estimates 
c. If the underlying theory is strong 
d. If all variables appear only once in the model 
To estimate k number of unknown coefficients, we must at least have 
a. k +1 number of equations 
b. (k)(k) number of equations 
c. k+ 1 number of equations 
d. k number of equations 
There is no unique way of estimating the parameters of SEM if the model is 
a. Under-identified 
b. Over-identified 
c. Either (a) or (b) 
d. Exactly identified 
A SEM is said to be exactly identified model, if 
a. Unique numerical values of the structural parameters can be obtained 
b. More than one numerical value can be obtained for some of the parameters of the structural 
equations 
c. Unique solution of all the structural coefficients is not possible 
d. Structural coefficient cannot be estimated 
An SEM is said to be over-identified if 
a. Unique numerical values of the structural parameters can be obtained 
b. More than one numerical value can be obtained for some of the parameters of the structural 
equations 
c. Unique solution of all the structural coefficients is not possible 
d. Structural coefficient cannot be estimated 
An SEM is said to be under-identified if 
a. Unique numerical values of the structural parameters can be obtained 
b. More than one numerical value can be obtained for some of the parameters of the structural 
equations 
c. Unique solution of all the structural coefficients is not possible 
d. Structural coefficient cannot be estimated 
For an SEM with k unknowns and k reduced form equations, the model is said to be 
a. Exactly identified 
b. Over identified 
c. Under identified 
d. Unidentified 
For a SEM with k unknowns and more than k equations in the reduced form, the model is said to be 
a. Exactly identified 
b. Over identified 
c. Under identified 
d. Unidentified 
For an exactly identified equation, the order condition that should be fulfilled is that the equation must 
a. Exclude one less than the total number of endogenous variable in the model 
b. Include one less than the total number of endogenous variable in the model 
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c. Include only one of the endogenous variable in the model 

d. Exclude two or more endogenous variables 
For a over-identified equation, the order condition that should be fulfilled is that the equation must 

a. Exclude one less than the total number of endogenous variable in the model 

b. Include one less than the total number of endogenous variable in the model 

c. Include only one of the endogenous variable in the model 

d. Exclude two or more endogenous variable 
For an SEM to be identified, the necessary and sufficient condition is 

a. Order condition 

b. Rank condition 

c. Both (a) and (b) above 

d. Neither (a) nor (b) above 
The rank condition of identification states that in a model containing M equations in M endogenous 
variables, and K predetermined variables, an equation is identified if and only if at least one nonzero 
determinant of the following order can be constructed from the coefficients of the variables excluded 
from that particular equation but included in other equations of the model: 

a MXK 

b. (M-1)(K- 1) 

c. (M-1)(M-1) 

d. (K-1)(K-1) 
The Hausman’s specification error test is used to test whether 

a. An exogenous variable is correlated with the error term 

b. An endogenous variable is correlated with the error term 

c. Either (a) or (b) above 

d. OLS method is appropriate to estimate the SEM 
Under Hausman’s specification error test, the Hp tested is that 

a. There is no simultaneity in the model 

b. There are no exogenous variables in the model 

c. There are no endogenous variables in the model 

d. There are no specification error in the model 


Exercises 


Questions 


190 
19-27 


19:3. 


19.4. 


Show that the two definitions of the order condition of identification (see Section 19.3) are equivalent. 
Deduce the structural coefficients from the reduced-form coefficients given in Eqs. (19.2.25) and 
(1992-27); 

Obtain the reduced form of the following models and determine in each case whether the structural 
equations are unidentified, just identified, or overidentified: 

a. Chap. 18, Example 18.2. 

b. Chap. 18, Example 18.3. 

c. Chap. 18, Example 18.6. 

Check the identifiability of the models of Exercise 19.3 by applying both the order and rank condi- 
tions of identification. 


19:5; 


19.6. 


19.7. 


19.8. 


19.9; 


19.10. 
TOSE 
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In the model (19.2.22) of the text it was shown that the supply equation was overidentified. What 


restrictions, if any, on the structural parameters will make this equation just identified? Justify the 
restrictions you impose. 


From the model 


Yi: = Bio + B12 Yz +X tuy 
Yor = Boo + Bar Yir + Y2 Xz: + uy 


the following reduced-form equations are obtained: 


Yi, = Mio + MXi + MX + w, 
Yo, = Ho + May Xy + Ma2Xn, + v: 
a. Are the structural equations identified? 


b. What happens to identification if it is known a priori that Yı = 0? 
Refer to Exercise 19.6. The estimated reduced-form equations are as follows: 


Yir =44+3Xy, + 8Xy, 
Yo, = 2+ 6X); + 10Xy, 
a. Obtain the values of the structural parameters. 


b. How would you test the null hypothesis that y}; = 0? 
The model 


Yis = Bio + B12 Yz: +yuXy + tir 
Yo, = Boo + Ba Yir + uz 


produces the following reduced-form equations: 


Vit =4+8X1, 
Yo, = 2+ 12Xj, 


a. Which structural coefficients, if any, can be estimated from the reduced-form coefficients? Demon- 
strate your contention. 

b. How does the answer to (a) change if it is known a priori that (1) 8), = 0 and (2) B,)= 0? 

Determine whether the structural equations of the model given in Exercise 18.8 are identified. 

Refer to Exercise 18.7 and find out which structural equations can be identified. 

Table 19.3 is a model in five equations with five endogenous variables Y and four exogenous variables 

X: 


Table 19.3 


Coefficients of the Variables 
Equation No. Yı Y2 Y3 Y4 Ys Xı X2 X3 X4 


1 1 Bi2 0 Bi4 0 yn 0 0 yi4 
2 0 1 B23 B24 0 0 722 ¥23 0 
3 B31 0 1 p34 B35 0 0 33 734 
4 0 Ba2 0 1 0 y4 0 y43 0 
5 Bs 0 0 B54 1 0 Y52 Y53 0 
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Determine the identifiability of each equation with the aid of the order and rank conditions of 


identifications. 
19.12. Consider the following extended Keynesian model of income determination: 


Consumption function: C, = Bi + BY, — BT, + 


Investment function: I, = a + Y;-1 + uz 
Taxation function: T; = yo + y1 Yı + uz: 
Income identity: Y=Ci+l+G: 
where C = consumption expenditure 
Y = income 
I = investment 
T = taxes 


G = government expenditure 
u’s = the disturbance terms 
In the model the endogenous variables are C, J, T, and Y and the predetermined variables are G and 
La 
By applying the order condition, check the identifiability of each of the equations in the system and 
of the system as a whole. What would happen if r,, the interest rate, assumed to be exogenous, were 
to appear on the right-hand side of the investment function? 

19.13. Refer to the data given in Table 18.1 of Chapter 18. Using these data, estimate the reduced-form 
regressions (19.1.2) and (19.1.4). Can you estimate 8) and B,? Show your calculations. Is the model 
identified? Why or why not? 

19.14. Suppose we propose yet another definition of the order condition of identifiability: 


K>m+k-1 
which states that the number of predetermined variables in the system can be no less than the number 
of unknown coefficients in the equation to be identified. Show that this definition is equivalent to the 
two other definitions of the order condition given in the text. 
19.15. A simplified version of Suits’s model of the watermelon market is as follows: 


Demand equation: P, = ao +.0;(Q;/ Nz) + 2(¥;/Nr) +03 Fy + uy 
Crop supply function: Q; = Bo + Bi\(P:/W,) + B2P;-1 + B3Ci-1 + aTi- H uy 


where P =price 
(Q/N) = per capita quantity demanded 
(Y/N) = per capita income 
F, = freight costs 
(P/ W) = price relative to the farm wage rate 
C = price of cotton 
T = price of other vegetables 
N = population 
P and Q are the endogenous variables. 
a. Obtain the reduced form. 
b. Determine whether the demand, the supply, or both functions are identified. 


*D. B. Suits, “An Econometric Model of the Watermelon Market,” Journal of Farm Economics, vol. 37, 1955, pp. 237-251. 
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Empirical Exercises 


19.16. Consider the following demand-and-supply model for money: 


Money demand: M? = bo + BiY, + BR; + BP + uy 
Money supply: M; =a +a1Y; + ux 


Observation M2 GDP TBRATE CPI 
1970 626.5 3,771.9 6.458 38.8 
1971 AOLE 3,898.6 4.348 40.5 
1972 802.3 4,105.0 4.071 41.8 
1973 855.5 4,341.5 7.041 44.4 
1974 902.1 4,319.6 7.886 49.3 
1975 1,016.2 4,311.2 5.838 53.8 
1976 1,152.0 4,540.9 4.989 56.9 
1977 1,270.3 4,750.5 5.265 60.6 
1978 1,366.0 5,015.0 7.221 65.2 
1979 1,473.7 5,173.4 10.041 72.6 
1980 1,599.8 5,161.7 11.506 82.4 
1981 1,7559 S 2I 14.029 90.9 
1982 1,910.1 5,189.3 10.686 96.5 
1983 2,126.4 5,423.8 8.63 99.6 
1984 2,309.8 5,813.6 9.58 103.9 
1985 2,495.5 6,053.7 7.48 107.6 
1986 2,732.2 6,263.6 , 5.98 109.6 
1987 2,831.3 6,475.1 5.82 113.6 
1988 2,994.3 6,742.7 6.69 118.3 
1989 3,158.3 6,981.4 8.12 124.0 
1990 3,277.7 7,112.5 7.51 130.7 
1991 3,378.3 7,100.5 5.42 136.2 
1992 3,431.8 7,336.6 3.45 140.3 
1993 3,482.5 7,532.7 3.02 144.5 
1994 3,498.5 7,835.5 4.29 148.2 
1995 3,641.7 8,031.7 Dro 152.4 
1996 3,820.5 8,328.9 5.02 156.9 
1997 4,035.0 8,703.5 5.07 160.5 
1998 4,381.8 9,066.9 4.81 163.0 
1999 4,639.2 9,470.3 4.66 166.6 
2000 4,921.7 9,817.0 5.85 {72242 
2001 5,433.5 9,890.7 3.45 177.1 
2002 5779.2 10,048.8 1.62 179:9 
2003 6,071.2 10,301.0 1.02 184.0 
2004 6,421.6 10,675.8 1.38 188.9 
2005 6,691.7 11,003.4 3.16 195.3 
2006 7,035.5 11,319.4 4.73 201.6 

Notes: M = M2 Money supply (billions of dollars). 


GDP = gross domestic product (billions of dollars). 
TBRATE = 3ymonth Treasury bill rate, %. 
CPI = Consumer Price Index (1982-1984 = 100). 


Source: Economic Report of the President, 2007, Tables B-2, B-60, B-69, B-73. 
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where M = money 


Y = income 
R = rate of interest 
P = price 


u’s = error terms 
Assume that R and P are exogenous and M and Y are endogenous. Table 19.4 gives data on 
M (M, definition), Y (GDP), R (3-month Treasury bill rate) and P (Consumer Price Index), for the 
United States for 1970-2006. 
a. Is the demand function identified? 
b. Is the supply function identified? 
c. Obtain the expressions for the reduced-form equations for M and Y. 
d. Apply the test of simultaneity to the supply function. 
e. How would we find out if Y in the money supply function is in fact endogenous? 
19.17. The Hausman test discussed in the text can also be conducted in the following way. Consider Eq. 
(19.4.7): 


QO; = Bo + Bi Pi + Biv + uz 


a. Since P, and v, have the same coefficients, how would you test that in a given application that is 
indeed the case? What are the implications of this? 

b. Since P, is uncorrelated with u,, by design (why?), one way to find out if P, is exogenous is to see 
if v, is correlated with u,„ How would you go about testing this? Which test do you use? (Hint: 
Substitute P, from [19.4.6] into Eq. [19.4.7].) 


Key to Multiple Choice Questions 


1. (b) 2. (d) SL ©) 4. (b) SOE) 6. (a) 7. (a) 8. (d) 9. (d) 
10. (a) Me (by) 12 O) 13. (a) 14. (b) tS. E) 16. (a) 17. (b) 18. (c) 
19. (b) 20. (a) 


CHAPTER 20 


Simultaneous-Equation 
Methods 


Having discussed the nature of the simultaneous-equation models in the previous two chapters, in this 
chapter we turn to the problem of estimation of the parameters of such models. At the outset it may be 
noted that the estimation problem is rather complex because there are a variety of estimation techniques 
with varying statistical properties. In view of the introductory nature of this text, we shall consider only a 
few of these techniques. Our discussion will be simple and often heuristic, the finer points being left to the 
references. 


20.1 Approaches to Estimation 


If we consider the general M equations model in M endogenous variables given in Eq. (19.1.1), we may 
adopt two approaches to estimate the structural equations, namely, single-equation methods, also known 
as limited information methods, and system methods, also known as full information methods. In the 
single-equation methods to be considered shortly, we estimate each equation in the system (of simultaneous 
equations) individually, taking into account any restrictions placed on that equation (such as exclusion of 
some variables) without worrying about the restrictions on the other equations in the system,! hence the name 
limited information methods. In the system methods, on the other hand, we estimate all the equations in the 
model simultaneously, taking due account of all restrictions on such equations by the omission or absence of 
some variables (recall that for identification such restrictions are essential), hence the name full information 
methods. 


'For the purpose of identification, however, information provided by other equations will have to be taken into account. 
But as noted in Chapter 19, estimation is possible only in the case of (fully or over-) identified equations. In this chapter we 
assume that the identification problem is solved using the techniques of Chapter 19. 
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As an example, consider the following four-equations model: 


j = Prost + Bi2Yz: + B13 Y3: + + yx + + Uys 
Yz, = Boo + + Bo3¥3 + YarXir + ¥22X24 - + Uy, een 
Yz: = B30 + B31 Vir + + B34Var + y3iXie + 32X24 + + U3, 
Y4: = Pao + + BarYr + 43X53, + tay 


where the Y’s are the endogenous variables and the X’s are the exogenous variables. If we are interested in 
estimating, say, the third equation, the single-equation methods will consider this equation only, noting that 
variables Y, and X; are excluded from it. In the systems methods, on the other hand, we try to estimate all 
four equations simultaneously, taking into account all the restrictions imposed on the various equations of 
the system. 

To preserve the spirit of simultaneous-equation models, ideally one should use the systems method, such 
as the full information maximum likelihood (FIML) method.” In practice, however, such methods are 
not commonly used for a variety of reasons. First, the computational burden is enormous. For example, the 
comparatively small (20 equations) 1955 Klein—Goldberger model of the U.S. economy had 151 nonzero 
coefficients, of which the authors estimated only 51 coefficients using the time series data. The Brookings- 
Social Science Research Council (SSRC) econometric model of the U.S. economy published in 1965 initially 
had 150 equations.? Although such elaborate models may furnish finer details of the various sectors of the 
economy, the computations are a stupendous task even in these days of high-speed computers, not to mention 
the cost involved. Second, the systems methods, such as FIML, lead to solutions that are highly nonlinear 
in the parameters and are therefore often difficult to determine. Third, if there is a specification error (say, a 
wrong functional form or exclusion of relevant variables) in one or more equations of the system, that error is 
transmitted to the rest of the system. As a result, the systems methods become very sensitive to specification 
errors. 

In practice, therefore, single-equation methods are often used. As Klein puts it, 


Single equation methods, in the context of a simultaneous system, may be less sensitive to specification error in 
the sense that those parts of the system that are correctly specified may not be affected appreciably by errors in 


specification in another part.* 
wv 


In the rest of the chapter we shall deal with single-equation methods only. Specifically, we shall discuss 
the following single-equation methods: 


1. Ordinary least squares (OLS) a 
2. Indirect least squares (ILS) 
3. Two-stage least squares (2SLS) 


?For a simple discussion of this method, see Carl F. Christ, Econometric Models and Methods, John Wiley & Sons, New York, 
1966, pp. 395-401. 


3James S. Duesenberry, Gary Fromm, Lawrence R. Klein, and Edwin Kuh, eds., A Quarterly Model of the United States 
Economy, Rand McNally, Chicago, 1965. 


‘Lawrence R. Klein, A Textbook of Econometrics, 2d ed., Prentice Hall, Englewood Cliffs, NJ, 1974, p. 150. 


Simultaneous-Equation Methods 753 


20.2 Recursive Models and Ordinary Least Squares 


We saw in Chapter 18 that, because of the interdependence between the stochastic disturbance term and the 
endogenous explanatory variable(s), the OLS method is inappropriate for the estimation of an equation in a 
system of simultaneous equations. If applied erroneously, then, as we saw in Section 18.3, the estimators are 
not only biased (in small samples) but also inconsistent; that is, the bias does not disappear no matter how 
large the sample size. There is, however, one situation where OLS can be applied appropriately even in the 
context of simultaneous equations. This is the case of the recursive, triangular, or causal models. To see the 
nature of these models, consider the following three-equation system: 


Yi: = Bio + iN + VX + Uy 
Yo, = Boo + b21 Yie + V2Xir + Y22X04 + Ure (20.2.1) 
Y3, = B30 + Bsi Vir + B32¥or + y31X12 + v32X04¢ + Uzr 


where, as usual, the Y's and the X’s are, respectively, the endogenous and exogenous variables. The distur- 
bances are such that 


COV (Uir, U21) = COV (Uir, U31) = COV (U2), U3) = 0 


that is, the same-period disturbances in different equations are uncorrelated (technically, this is the assumption 
of zero contemporaneous correlation). 

Now consider the first equation of (20.2.1). Since it contains only the exogenous variables on the right- 
hand side and since by assumption they are uncorrelated with the disturbance term u,, this equation satisfies 
the critical assumption of the classical OLS, namely, uncorrelatedness between the explanatory variables and 
the stochastic disturbances. Hence, OLS can be applied straightforwardly to this equation. Next consider the 
second equation of (20.2.1), which contains the endogenous variable Y, as an explanatory variable along with 
the nonstochastic X’s. Now OLS can also be applied to this equation, provided Y,, and u,, are uncorrelated. Is 
this so? The answer is yes because u,, which affects Y,, is by assumption uncorrelated with u,. Therefore, for 
all practical purposes, Y is a predetermined variable insofar as Y, is concerned. Hence, one can proceed with 
OLS estimation of this equation. Carrying this argument a step further, we can also apply OLS to the third 
equation in (20.2.1) because both Y, and Y, are uncorrelated with u3. 

Thus, in the recursive system OLS can be applied to each equation separately. Actually, we do not have a 
simultaneous-equation problem in this situation. From the structure of such systems, it is clear that there is no 
interdependence among the endogenous variables. Thus, Y, affects Y,, but Y, does not affect Y,. Similarly, Y, 
and Y, influence Y, without, in turn, being influenced by Y3. In other words, each equation exhibits a unilateral 
causal dependence, hence the name causal models.° Schematically, we have Figure 20.1. 


5The alternative name triangular stems from the fact that if we form the matrix of the coefficients of the endogenous vari- 
ables given in Eq. (20.2.1), we obtain the following triangular matrix: 
A A ya 
Equation 1 | 1 0 
Equation 2 | 621 1 
Equation 3 | £31 £32 


aa > a >) 


Note that the entries above the main diagonal are zeros (why?). 
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ui 


(Xb X3) 


uz 


u3 


Figure 20.1 Recursive model. 


As an example of a recursive system, one may postulate the following model of wage and price determi- 
nation: 


Price equation: È, = Bio + Pu Wi-1 + Bok: + BisM + Biali + ur 


. : (20.2.2) 
Wage equation: W, = Boo + Boi UN; + B32P; + Uz 


where P = rate of change of price per unit of output 
W = rate of change of wages per employee 
R = rate of change of price of capital 
M = rate of change of import prices 
L = rate of change of labor productivity 
UN = unemployment rate, %° 


The price equation postulates that the rate of change of price in the current period is a function of the 
rates of change in the prices of capital and of raw material, the rate of change in labor productivity, and the 
rate of change in wages in the previous period. The wage equation shows that the rate of change in wages in 
the current period is determined by the current period rate of change in price and the unemployment rate. It 
is clear that the causal chain runs from W,_, —> P, —> W,, and hence OLS may be applied to estimate the 
parameters of the two equations individually. 

Although recursive models have proved to be useful, most simultaneous-equation models do not exhibit 
such a unilateral cause-and-effect relationship. Therefore, OLS, in general, is inappropriate to estimate a 
single equation in the context of a simultaneous-equation model.’ 

There are some who argue that, although OLS is generally inapplicable to simultaneous-equation models, 
one can use it, if only as a standard or norm of comparison. That is, one can estimate a structural equation 
by OLS, with the resulting properties of biasedness, inconsistency, etc. Then the same equation may be 
estimated by other methods especially designed to handle the simultaneity problem and the results of the 
two methods compared, at least qualitatively. In many applications the results of the inappropriately applied 
OLS may not differ very much from those obtained by more sophisticated methods, as we shall see later. 


®Note: The dotted symbol means “time derivative.” For example, P = dP/dt. For discrete time series, dP/dt is sometimes 
approximated by AP/At, where the symbol A is the first difference operator, which was originally introduced in Chapter 12. 


“It is important to keep in mind that we are assuming that the disturbances across equations are contemporaneously 
uncorrelated. If this is not the case, we may have to resort to the Zellner SURE (seemingly unrelated regressions) 
estimation technique to estimate the parameters of the recursive system. See A. Zellner, “An Efficient Method of Estimating 


Seemingly Unrelated Regressions and Tests for Aggregation Bias,” Journal of the American Statistical Association, vol. 57, 1962, 
pp. 348-368. 
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In principle, one should not have much objection to the production of the results based on OLS so long as 
estimates based on alternative methods devised for simultaneous-equation models are also given. In fact, this 
approach might give us some idea about how badly OLS does in situations when it is applied inappropriately.® 


20.3 Estimation of a Just Identified Equation: The Method of Indirect 
Least Squares (ILS) 


For a just or exactly identified structural equation, the method of obtaining the estimates of the structural 
coefficients from the OLS estimates of the reduced-form coefficients is known as the method of indirect 
least squares (ILS), and the estimates thus obtained are known as the indirect least-squares estimates. ILS 
involves the following three steps: 


Step 1. We first obtain the reduced-form equations. As noted in Chapter 19, these reduced-form equations 
are obtained from the structural equations in such a manner that the dependent variable in each equation is 
the only endogenous variable and is a function solely of the predetermined (exogenous or lagged endog- 
enous) variables and the stochastic error term(s). 


Step 2. We apply OLS to the reduced-form equations individually. This operation is permissible since 
the explanatory variables in these equations are predetermined and hence uncorrelated with the stochastic 
disturbances. The estimates thus obtained are consistent.” 


Step 3. We obtain estimates of the original structural coefficients from the estimated reduced-form coeffi- 
cients obtained in Step 2. As noted in Chapter 19, if an equation is exactly identified, there is a one-to-one 
correspondence between the structural and reduced-form coefficients; that is, one can derive unique 
estimates of the former from the latter. 


As this three-step procedure indicates, the name ILS derives from the fact that structural coefficients (the 
object of primary enquiry in most cases) are obtained indirectly from the OLS estimates of the reduced-form 
coefficients. 


An Illustrative Example 


Consider the demand-and-supply model introduced in Section 19.2, which for convenience is given below 
with a slight change in notation: 


Demand function: Q; = œo +0 P, +02X; +U (20.3.1) 


Supply function: QO, = Bo + Bi Pe + ux (20.3.2) 


where Q = quantity 

P = price 

X = income or expenditure 
Assume that X is exogenous. As noted previously, the supply function is exactly identified whereas the 
demand function is not identified. 


81t may also be noted that in small samples the alternative estimators, like the OLS estimators, are also biased. But the OLS 
estimator has the “virtue” that it has minimum variance among these alternative estimators. But this is true of small samples 
only. 

%in addition to being consistent, the estimates “may be best unbiased and/or asymptotically efficient, depending respec- 
tively upon whether (/) the z’s [= X's] are exogenous and not merely predetermined [i.e., do not contain lagged values of 
endogenous variables] and/or (i/) the distribution of the disturbances is normal.” See W. C. Hood and Tjalling C. Koopmans, 
Studies in Econometric Method, John Wiley & Sons, New York, 1953, p. 133. 
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The reduced-form equations corresponding to the preceding structural equations are 
Va = Ilo + TI, X; + Wz (20.3.3) 
Q: = Bo + PiP; + uz- (20.3.4) 


where the II’s are the reduced-form coefficients and are (nonlinear) combinations of the structural coeffi- 
cients, as shown in Eqs. (19.2.16) and (19.2.18), and where w and v are linear combinations of the structural 
disturbances u; and u. 

Notice that each reduced-form equation contains only one endogenous variable, which is the dependent 
variable and which is a function solely of the exogenous variable X (income) and the stochastic disturbances. 
Hence, the parameters of the preceding reduced- form equations may be estimated by OLS. These estimates 
are 


i= Lem (20.3.5) 
map 

o = P— Ô, X (20.3.6) 

fi; = 2 (20.3.7) 
De 

ee E S (20.3.8) 


where the lowercase letters, as usual, denote deviations from sample means and where Ọ and P are the 
sample mean values of Q and P. As noted previously, the Ñ;’s are consistent estimators and under appro- 
priate assumptions are also minimum variance unbiased or asymptotically efficient (see footnote 9). 

Since our primary objective is to determine the structural coefficients, let us see if we can estimate them 
from the reduced-form coefficients. Now as shown in Section 19.2, the supply function is exactly identified. 
Therefore, its parameters can be estimated uniquely from the reduced-form coefficients as follows: 


an 


Hence, the estimates of these parameters can be obtained from the estimates of the reduced-form coeffi- 
cients as 


Bo = Iz — BiNy and By 


wv 


Bo = My — Bi Mo (20.3.9) 

rn al i 

ĝi = = (20.3.10) 
TT; 


which are the ILS estimators. Note that the parameters of the demand function cannot be thus estimated 
(however, see Exercise 20.13). 


To give some numerical results, we obtained the data shown in Table 20.1. First we estimate the reduced- 
form equations, regressing separately price and quantity on per capita real consumption expenditure. The 


results are as follows: 
Ê, = 90.9601 + 0.00074, 
se= (4.0517) (0.0002) (20.3.11) 
t = (22.4499) (3.0060) . R= (0.2440) 


Ô, 


se 


II 


59.7618 + 0.0020X, 
(1.5600) (0.00009) 


t = (38.3080) (20.9273) 
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R? = 0.9399 


Using Egs. (20.3.9) and (20.3.10), we obtain these ILS estimates: 


Bo = —183.7043 


Êi = 2.6766 


(20.3.12) 


(20.3.13) 


(20.3.14) 


Table 20.1 Crop Production, Crop Prices, and per Capita Personal Consumption Expenditures, 2007 Dollars, United 


States, 1975-2004 


Observation 


1975 
1976 
1977 
1978 
1979 
1980 
1981 
1982 
1983 
1984 
1985 
1986 
1987 
1988 
1989 
1990 
1991 
1992 
1993 
1994 
1995 
1996 
1997 
1998 
1999 
2000 
2001 
2002 
2003 
2004 


Index of Crop 
Production 


Index of Crop Prices 
Received by Farmers 


(1996 = 100), Q (1990-1992 = 100), P 


108 
112 


Real per Capita 
Personal Consumption 
Expenditure, X 


4,789 
5,282 
5,804 
6,417 
7,073 
7,716 
8,439 
8,945 
9,775 
10,589 
11,406 
12,048 
12,766 
13,685 
14,546 
15,349 
15,722 
16,485 
17,204 
18,004 
18,665 
19,490 
20,323 
21,291 
22,491 
23,862 
24,722 
25,501 
26,463 
27,937 


Source: Economic Report of the President, 2007. Data on Q (Table B-99), on P (Table B-101), and on X (Table B-31). 
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Therefore, the estimated ILS regression is!” 


Ô, = —183.7043 + 2.6766P, (20.3.15) 
For comparison, we give the results of the (inappropriately applied) OLS regression of Q on P: 


A 


Oh c 20.89 C6732, 
se = (23.04) (0.2246) (20.3.16) 
t= (09199) R? = 0.2430 
These results show how OLS can distort the “true” picture when it is applied in inappropriate situations. 


Properties of ILS Estimators 


We have seen that the estimators of the reduced-form coefficients are consistent and under appropriate 
assumptions also best unbiased or asymptotically efficient (see footnote 9). Do these properties carry over 
to the ILS estimators? It can be shown that the ILS estimators inherit all the asymptotic properties of the 
reduced-form estimators, such as consistency and asymptotic efficiency. But (the small sample) properties 
such as unbiasedness do not generally hold true. It is shown in Appendix 20A, Section 20A.1, that the ILS 
estimators By and Â; of the supply function given previously are biased but the bias disappears as the sample 
size increases indefinitely (that is, the estimators are consistent)."! 


20.4 Estimation of an Overidentified Equation: The Method of 
Two-Stage Least Squares (2SLS) 


Consider the following model: 


Income function: Yi: = Bio + + Bi Yor + VX + V12Xy + Uy (20.4.1) 
Money supply function Yz, = Boy + Boi Yir + ur, (20.4.2) 


where Y, = income 

Y, = stock of money 

X, = investment expenditure 

X, = government expenditure on goods and services 
The variables X, and X, are exogenous. 

The income equation, a hybrid of quantity-theory-Keynesian approaches to income determination, states 
that income is determined by money supply, investment expenditure, and government expenditure. The 
money supply function postulates that the stock of money is determined (by the Federal Reserve System) on 
the basis of the level of income. Obviously, we have a simultaneous-equation problem, which can be checked 
by the simultaneity test discussed in Chapter 19. 


wv 


"We have not presented the standard errors of the estimated structural coefficients because, as noted previously, these 
coefficients are generally nonlinear functions of the reduced-form coefficients and there is no simple method of estimating 
their standard errors from the standard errors of the reduced-form coefficients. For large-sample size, however, standard 
errors of the structural coefficients can be obtained approximately. For details, see Jan Kmenta, Elements of Econometrics, 
Macmillan, New York, 1971, p. 444. ; 


Mintuitively this can be seen as follows: E ($1) = By if E(113/T1) = (13/71). Now even if E(f13) = M3 and E (1114),= My, it 
can be shown that £ (13/111) # E(113)/E (Ñ); that is, the expectation of the ratio of two variables is not equal to the ratio of 
the expectations of the two variables. However, as shown in Appendix 20A.1, plim (113/14) = plim (I13)/plim (ñ) = DETR 
since II3 and Il; are consistent estimators. 
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Applying the order condition of identification, we can see that the income equation is underidentified 
whereas the money supply equation is overidentified. There is not much that can be done about the income 
equation short of changing the model specification. The overidentified money supply function may not be 
estimated by ILS because there are two estimates of B»; (the reader should verify this via the reduced-form 
coefficients). 

As a matter of practice, one may apply OLS to the money supply equation, but the estimates thus obtained 
will be inconsistent in view of the likely correlation between the stochastic explanatory variable Y, and the 
stochastic disturbance term u». Suppose, however, we find a “proxy” for the stochastic explanatory variable 
Y, such that, although “resembling” Y, (in the sense that it is highly correlated with Y,), itis uncorrelated with 
u». Such a proxy is also known as an instrumental variable (see Chapter 17). If one can find such a proxy, 
OLS can be used straightforwardly to estimate the money supply function. But how does one obtain such an 
instrumental variable? One answer is provided by the two-stage least squares (2SLS), developed indepen- 
dently by Henri Theil!” and Robert Basmann.'* As the name indicates, the method involves two successive 
applications of OLS. The process is as follows: 


Stage 1. To get rid of the likely correlation between Y, and u, regress first Y, on all the predetermined 
variables in the whole system, not just that equation. In the present case, this means regressing Y, on X, 
and X, as follows: 
Yu = Mo + Ñ Xy + Ñ Xz + th i (20.4.3) 
where i, are the usual OLS residuals. From Eq. (20.4.3) we obtain 
fi = fio aF ÑX IF Îl Xz (20.4.4) 


where J, is an estimate of the mean value of Y conditional upon the fixed X’s. Note that Eq. (20.4.3) is 
nothing but a reduced-form regression because only the exogenous or predetermined variables appear on 
the right-hand side. 

Equation (20.4.3) can now be expressed as 


Yn = Py ti, (20.4.5) 


which shows that the stochastic Y, consists of two parts: Îi, which is a linear combination of the nonsto- 
chastic X’s, and a random component i7,. Following the OLS theory, Yı, and ù, are uncorrelated. (Why?) 


Stage 2. The overidentified money supply equation can now be written as 
Yor = Boo + Bai(Yir + tir) + ur, 
= Boo + Bor Yar + (ure + Borttr) (20.4.6) 
= foo + Ba Êi + ut 


where už = ur, + Brill. 

Comparing Eq. (20.4.6) with Eq. (20.4.2), we see that they are very similar in appearance, the only 
difference being that Y, is replaced by Yı. What is the advantage of Eq. (20.4.6)? It can be shown that 
although Y, in the original money supply equation is correlated or likely to be correlated with the disturbance 


12Henri Theil, “Repeated Least-Squares Applied to Complete Equation Systems,” The Hague: The Central Planning Bureau, 
The Netherlands, 1953 (mimeographed). 

13Robert L. Basmann, “A Generalized Classical Method of Linear Estimation of Coefficients in a Structural Equation,” Econo- 
metrica, vol. 25, 1957, pp. 77-83. 


760 Basic Econometrics 


term u; (hence rendering OLS inappropriate), Y;, in Eq. (20.4.6) is uncorrelated with u7 asymptotically, 

that is, in the large sample (or more accurately, as the sample size increases indefinitely). As a result, OLS 

can be applied to Eq. (20.4.6), which will give consistent estimates of the parameters of the money supply 
function.’ 

As this two-stage procedure indicates, the basic idea behind 2SLS is to “purify” the stochastic explanatory 
variable Y, of the influence of the stochastic disturbance uz. This goal is accomplished by performing the 
reduced-form regression of Y, on all the predetermined variables in the system (Stage 1), obtaining the 
estimates Y;, and replacing Y,, in the original equation by the estimated Y),, and then applying OLS to the 
equation thus transformed (Stage 2). The estimators thus obtained are consistent; that is, they converge to 
their true values as the sample size-increases indefinitely. 

To illustrate 2SLS further, let us modify the income—money supply model as follows: 


Yur = Bio + BizYar + Vike + Y12X24 + uy (20.4.7) 


Yo, = Boo + BaiYit + 3X34 + Y24Xar + Urry (20.4.8) 
where, in addition to the variables already defined, X, = income in the previous time period and X, = money 
supply in the previous period. Both X, and X, are predetermined. 

It can be readily verified that both Eqs. (20.4.7) and (20.4.8) are overidentified. To apply 2SLS, we proceed 


as follows: In Stage 1 we regress the endogenous variables on all the predetermined variables in the system. 
Thus, 


Yie = Myo + Pi Xe + MirXoe + 11,3.X3p + N4 X4 + tiie (20.4.9) 
Yor = Vo + Ña Xir + fiX + Ñ Xs: + Moa Xa + thoy (20.4.10) 


In Stage 2 we replace Y, and Y, in the original (structural) equations by their estimated values from the 
preceding two regressions and then run the OLS regressions as follows: 


Yur = Bio + Bir Yar + Y1 Xir + Vi2X2e + UF, (20.4.11) 
Yor = Boo + Bor Yur + 23X31 + YraX ar + U3, (20.4.12) 


where uf, = uir + Bina, and ù}, = uz, + B21t%1r. The estimates thus obtained will be consistent. 
Note the following features of 2SLS. 


1. It can be applied to an individual equation in the system without directly taking into account any 
other equation(s) in the system. Hence, for solving econometric models involving a large number of 
equations, 2SLS offers an economical method. For this reason the method has been used extensively in 
practice. 


2. Unlike ILS, which provides multiple estimates of parameters in the overidentified equations, 2SLS 
provides only one estimate per parameter. 


“But note that in small samples Y}; is likely to be correlated with u*. The reason is as follows: From Eq. (20.4.4) we see that 
Îi: is a weighted linear combination of the predetermined X's, with [1’s as the weights. Now even if the predetermined 
variables are truly nonstochastic, the Ĥ’s, being estimators, are stochastic. Therefore, Yit is stochastic too. Now from our 
discussion of the reduced-form equations and indirect least-squares estimation, it is clear that the reduced-coefficients, the 
II's, are functions of the stochastic disturbances, such as uz. And since Yi; depends on the T1’s, it is likely to be correlated 
with u,, which is a component of ut. As a result, Ýi; is expected to be correlated with u*. But as noted previously, this 


correlation disappears as the sample size tends to infinity. The upshot of all this is that in small samples the 2SLS procedure 
may lead to biased estimation. 
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3. It is easy to apply because all one needs to know is the total number of exogenous or predetermined 
variables in the system without knowing any other variables in the system. 

4. Although specially designed to handle overidentitied equations, the method can also be applied to 
exactly identified equations. But then ILS and 2SLS will give identical estimates. (Why?) 

5. If the R% values in the reduced-form regressions (that is, Stage | regressions) are very high, say, in 
excess of 0.8, the classical OLS estimates and 2SLS estimates will be very close. But this result should 
not be surprising because if the R? value in the first stage is very high, it means that the estimated 
values of the endogenous variables are very close to their actual values, and hence the latter are less 
likely to be correlated with the stochastic disturbances in the original structural equations. (Why?)!9 
If, however, the R` values in the first- -stage regressions are very low, the 2SLS estimates will be practi- 
cally meaningless because we shall be replacing the original Y's in the second-stage regressions by the 
estimated Y’s from the first- -stage regressions, which will essentially represent the disturbances in the 
first-stage regressions. In other words, in this case, the Y’s will be very poor proxies for the original Y’s. 

6. Notice that in reporting the ILS regression in Eq. (20.3.15) we did not state the standard errors of the 
estimated coefficients (for reasons explained in footnote 10). But we can do this for the 2SLS estimates 
because the structural coefficients are directly estimated from the second-stage (OLS) regressions. 
There is, however, a caution to be exercised. The estimated standard errors in the second-stage regres- 
sions need to be modified because, as can be seen from Eq. (20.4.6), the error term už is, in fact, the 
original error term u», plus 62)4,. Hence, the variance of u* is not exactly equal to the variance of 
the original u», However, the modification required can be easily effected by the formula given in 
Appendix 20A, Section 20A.2. 

7. In using the 2SLS, bear in mind the following remarks of Henri Theil: 


The statistical justification of the 2SLS is of the large-sample type. When there are no lagged endogenous 
variables,. . . the 2SLS coefficient estimators are consistent if the exogenous variables are constant in repeated 
samples and if the disturbance[s] [appearing in the various behavioral or structural equations] . . . are indepen- 
dently and identically distributed with zero means and finite variances. . . . If these two conditions are satisfied, 
the sampling distribution of 2SLS coefficient estimators becomes approximately normal for large samples. . . . 

When the equation system contains lagged endogenous variables, the consistency and large-sample normality 
of the 2SLS coefficient estimators require an additional condition, . . . that as the sample increases the mean 
square of the values taken by each lagged endogenous variable converges in probability to a positive limit. . . . 

If [the disturbances appearing in the various structural equations are] not independently distributed, lagged 
endogenous variables are not independent of the current operation of the equation system. . .. which means 
these variables are not really predetermined. If these variables are nevertheless treated as predetermined in the 


2SLS procedure, the resulting estimators are not consistent.!® 


20.5 2SLS: A Numerical Example 


To illustrate the 2SLS method, consider the income—money supply model given previously in Eqs. (20.4.1) 
and (20.4.2). As shown, the money supply equation is overidentified. To estimate the parameters of this 
equation, we resort to the two-stage least-squares method. The data required for analysis are given in Table 
20.2; this table also gives some data that are required to answer some of the questions given in the exercises. 


15in the extreme case of R? = 1 in the first-stage regression, the endogenous explanatory variable in the original (overidenti- 
fied) equation will be practically nonstochastic (why?). 
16Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs, NJ, 1978, pp. 341-342. 
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Table 20.2 GDP, M2, FEDEXP, TB6, USA, 1970-2005 


Observation GDP (Y1) M2 (Y2) GPDI (X1) FEDEXP (X2) TB6 (X3) 


1970 3,221.9 626.5 427.1 - 201.1 6.562 
1971 3,898.6 710.3 475.7 220.0 4.511 
1972 4,105.0 802.3 $32.1 244.4 4.466 
1973 42A 5S mapan 594.4 261.7 7.178 
1974 4,319.6 902.1 550.6 293.3 7.926 
1975 4,311.2 1,016.2 453.1 346.2 6.122 
1976 4,540.9 1152.0 544.7 374.3 5.266 
1977 4,750.5 ° 1,270.3 627.0 407.5 SSMO 
1978 5,015.0 1,366.0 702.6 450.0 7.572 
1979 5,173.4 1,473.7 725.0 497.5 10.017 
1980 SMe i5998 645.3 | 585.7 11.374 
1981 3729 lee 1,755.4 704.9 672.7 13.776 
1982 5,189.3 UPAR: 606.0 748.5 11.084 
1983 5,423.8 2,126.5 662.5 815.4 8.75 
1984 5,513.6 2,310.0 857.7 877.1 9.80 
1985 6,053.7 2,495.7 849.7 948.2 7.66 
1986 6,263.6 2,732.4 843.9 1,006.0 6.03 
1987 6,475.1 2,831.4 870.0 1,041.6 6.05 
1988 6,742.7 2,994.5 890.5 1,622.7 6.92 
1989 6,981.4 3,158.5 926.2 675 8.04 
1990 Z7 W25 3,278.6 895.1 1253.5 7.47 
1991 7,100.5 3379 822.2 1,313:0 5.49 
1922 7,336.6 3,432.5 889.0 1,444.6 . 3.57 
1993 Z 3327 3,484.0 968.3 1,496.0 3.14 
1994 7,835.5 3,497.5 1,099.6 1,533.1 4.66 
1995 8,031.7 3,640.4 1,134.0 1,603.5 559 
1996 oe S 1,234.3 1,665.8 5:09 
1997 8,703.5 4,031.6 T3877 1,708.9 5.18 
1998 9,066.9 4,379.0 1,524.1 1,734.9 4.85 
1999 9,470.3 4,641.1 1,642.6 1,787.6 4.76 
2000 9,817.0 4,920.9 17333 1,864.4 5.92 
2001 9,890.7 5,430.3 1,598.4 179695 339 T 
2002 10,048.8 5,774.1 1,597.4 2,101.1 1.69 
2003 10,301.0 6,062.0 1,613.1 2,252.1 1.06 
2004 10,703.5 6,411.7 1,770.6 2,383.0 1.58 


2005 11,048.6 6,669.4 1,866.3 25559 3.40 


Notes: Yı = GDP = gross domestic product (billions of chained 2000 dollars). 
Y2 = M2 = M2 money supply (billions of dollars). 
Xı = GPDI = gross private domestic investment (billions of chained 2000 dollars). 
X2 = FEDEXP = Federal government expenditure (billions of dollars), 
X; = TB6 = 6-month Treasury bill rate (%). 


Source: Economic Report of the President, 2007. Tables B-2, B-69, B-84, and B-73. 


Stage | Regression 


We first regress the stochastic explanatory variable income Y,, represented by GDP, on the predetermined 
variables private investment X, and government expenditure X,, obtaining the following results: 
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Yur = 2689.848 + 1.8700X1,+ 2.0343X;, 
se= (67.9874) (0.1717) (0.1075) (20.5.1) 
t= (39.5639) (10.8938) (18.9295) R= 0.9964 


Stage 2 Regression 


We now estimate the money supply function (20.4.2), replacing the endogenous variable Y, by Y, estimated 
from Eq. (20.5.1) (= Y,). The results are as follows: 


Yo, = —2440.180 + 0.7920F,, 
se= (127.3720) (0.0178) (20.5.2) 
t= (—19.1579) (44.5246) R? = 0.9831 


As we pointed out previously, the estimated standard errors given in Eq. (20.5.2) need to be corrected in the 
manner suggested in Appendix 20.A, Section 20A.2. Effecting this correction (most econometric packages 
can do it now), we obtain the following results: 


Yo, = —2440.180 + 0.7920, 
se= (126.9598) (0.0212) (20.5.3) 
t= (—17.3149) (37.3057) R? = 0.9803 


As noted in Appendix 20A, Section 20A.2, the standard errors given in Eq. (20.5.3) do not differ much from 
those given in Eq. (20.5.2) because the R? in Stage 1 regression is very high. 


OLS Regression 


For comparison, we give the regression of money stock on income as shown in Eq. (20.4.2) without “purging” 
the stochastic Y,, of the influence of the stochastic disturbance term. 


Yo, = —2195.468 + 0.7911Y;, 
se= (126.6460) (0.0211) (20.5.4) 
t= (—17.3354) (37.3812) R? = 0.9803 


Comparing the “inappropriate” OLS results with the Stage 2 regression, we see that the two regressions 
are virtually the same. Does this mean that the 2SLS procedure is worthless? Not at all. That in the present 
situation the two results are practically identical should not be surprising because, as noted previously, the 
R? value in the first stage is very high, thus making the estimated Yı, virtually identical with the actual Y. ip 
Therefore, in this case the OLS and second-stage regressions will be more or less similar. But there is no 
guarantee that this will happen in every application. An implication, then, is that in overidentified equations 
one should not accept the classical OLS procedure without checking the second-stage regression(s). 


Simultaneity between GDP and Money Supply 


Let us find out if GDP (Y,) and money supply (Y,) are mutually dependent. For this purpose we use the 
Hausman test of simultaneity discussed in Chapter 19. 
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First we regress GDP on X; (investment expenditure) and X, (government expenditure), the exogenous 
variables in the system (i.e., we estimate the reduced-form regression). From this regression we obtain the 
estimated GDP and the residuals ®, as suggested in Eq. (19.4.7). Then we regress money supply on estimated 
GDP and v, to obtain the following results: 


Yo, = —2198.297 + 0.7915¥1, + 0.69849, 
se= (129.0548) (0.0215) (0.2970) (20.5.5) 
t= (17.0338) (36.70016) (2.3511) 


Since the t value of %, is statistically significant (the p value is 0.0263), we cannot reject the hypothesis of 
simultaneity between money supply and GDP, which should not be surprising. (Note: Strictly speaking, this 
conclusion is valid only in large samples; technically, it is only valid as the sample size increases indefinitely.) 


Hypothesis Testing 


Suppose we want to test the hypothesis that income has no effect on money demand. Can we test this 
hypothesis with the usual r test from the estimated regression (20.5.2)? Yes, provided the sample is large and 
provided we correct the standard errors as shown in Eq. (20.5.3), we can use the f test to test the significance 
of an individual coefficient and the F test to testjoint significance of two or more coefficients, using formula 
(8.4.7).17 

What happens if the error term in a structural equation is autocorrelated and/or correlated with the error 
term in another structural equation in the system? A full answer to this question will take us beyond the 
scope of the book and is better left for the references (see the reference given in footnote 7). Nevertheless, 
estimation techniques (such as Zellner’s SURE technique) do exist to handle these complications. 

To conclude the discussion of our numerical example, it may be added that the various steps involved in 
the application of 2SLS are now routinely handled by software packages such as STATA and EViews. It was 
only for pedagogical reason we showed the details of 2SLS. See Exercise 20.15. 


20.6 Illustrative Examples 


In this section we consider some applications of the simultaneous-equation methods. 


Example 20.1 Advertising, Concentration, and Price Margins 


To study the interrelationships among advertising, concentration (as measured by the concentration ratio), 
and pu east margins, Allyn D. Strickland and Leonard W. Weiss formulated the following three-equation 
model. 


Advertising intensity function: 


Ad/S = ao + aıM + a2(CD/S) + a3C + a4C + asGr + agDur (20.6.1) 


17But take this precaution: The restricted and unrestricted RSS in the numerator must be calculated using predicted Y (as in 
Stage 2 of 2SLS) and the RSS in the denominator is calculated using actual rather than predicted values of the regressors. For 
an accessible discussion of this point, see T. Dudley Wallace and J. Lew Silver, Econometrics: An Introduction, Addison-Wesley 
Reading, Mass., 1988, Sec. 8.5. i 


18See their “Advertising, Concentration, and Price-Cost Margins,” Journal of Political Economy, vol. 84, no. 5, 1976 
pp. 1109-1121. 


| 


Concentration function: 


Price-cost margin function: 


M = co + c (K/S) + c2Gr + 3C + c4GD + cs5(Ad/S) + ce(MES/S) 


where Ad = advertising expense 
S = value of shipments 


C = four-firm concentration ratio 


CD = consumer demand 


MES = minimum efficient scale 


M = price/cost margin 
Gr = annual rate of growth of industrial production 
Dur = dummy variable for durable goods industry 
K = capital stock 
GD = measure of geographic dispersion of output 


By the order conditions for identifiability, Eq. (20.6.2) is overidentified, whereas Eqs. (20.6.1) and (20.6.3) 
are exactly identified. 

The data for the analysis came largely from the 1963 Census of Manufacturers and covered 408 of the 
417 four-digit manufacturing industries. The three equations were first estimated by OLS, yielding the results 
shown in Table 20.3. To correct for the simultaneous-equation bias, the authors reestimated the model using 
2SLS. The ensuing results are given in Table 20.4. We leave it to the reader to compare the two results. 


Table 20.3 OLS Estimates of Three Equations (¢ ratios in parentheses) 


Constant 


C= bo + b,(Ad/S) + b2(MES/S) 
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(20.6.2) 


(20.6.3) 


Ad/S 
Eq. (20.6.1) 


—0.0314 (—7.45) 
0.0554 (3.56) 
—0.0568 (—3.38) 
0.1123 (9.84) 
0.0257 (8.94) 
0.0387 (1.64) 
~—0.0021 (—1.11) 


Dependent Variable 


a 
Eq. (20.6.2) 


0.2638 (25.93) 


1161B8 83) 
4.1852 (18.99) 


0.485 
405 


401 


Eq. (20.6.3) 


0.1682 (17.15) 
0.0629 (2.89) 


0.2255 (2.61) 
1.6536 (11.00) 
0.0686 (0.54) 
0.1123 (8.03) 

—0.0003 (—2.90) 
0.402 
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Table 20.4 Two-Stage Least-Squares Estimates of Three Equations (f ratios in parentheses) 


ee Dependent Variable č 
Ad/S G M 


Eq. (20.6.1) Eq. (20.6.2) Eq. (20.6.3) 
Constant —0.0245 (—3.86) 0.2591 (21.30) 0.1736 (14.66) 
€ 0.0737 (2.84) - 0.0377 (0.93) 
C2 —0.0643 (—2.64) = = 
M 0.0544 (2.01) = za 
CD/S 0.0269 (8.96) = = 
Gr 0.0539 (2.09) ` = 0.2336 (2.61) 
Dur —0.0018 (—0.93) = = 
Ad/S = 1.5347 (2.42) 1.6256 (5.52) 
MES/S = 4.169 (18.84) 0.1720 (0.92) 
K/S = = 0.1165 (7.30) 
GD = = —0.0003 (—2.79) 


Example 20.2 Klein’s Model I 


In Example 18.6 we discussed briefly the pioneering model of Klein. Initially, the model was estimated for the 
period 1920-1941. The underlying data are given in Table 20.5; and OLS, reduced-form, and 2SLS estimates 
are given in Table 20.6. We leave it to the reader to interpret these results. 


Year (Ce P WwW l K_ X W’ G T 


1921 41.9 12.4 25:5 —0.2 182.8 45.6 27 319 W7 
1922 45.0 16.9 293 169 182.6 50.1 2.9 32 3.9 
1923 49.2 18.4 34.1 5:2 184.5 37.2 29 2.8 4.7 
1924 50.6 19.4 33.9 3.0 189.7 57M Sal 3.5 3.8 
1925 52.6 20.1 35.4 5.1 192.7 61.0 3:2 39 39 
1926 Seal 19.6 37.4 5.6 197.8 64.0 33 33 7.0 
1927 56.2 19.8 37.9 4.2 203.4 64.4 3.6 4.0 6.7 
1928 57.3 21.1 39.2 3.0 207.6 64.5 307 4.2 4.2 SA 
1929 57.8 217 41.3 EL 210.6 67.0 4.0 4.1 4.0 
1930 55.0 15.6 379 1.0 21S 61.2 4.2 572 Tn 
1931 50.9 11.4 34.5 —3.4 216.7 53.4 4.8 oe ted 
1932 45.6 7.0 29.0 G2 213.3 44.3 583 4.9 8.3 
1933 46.5 2 28.5 —5.1 207.1 45.1 5.6 3.7 5.4 
1934 48.7 12.3 30.6 —3.0 202.0 49.7 6.0 4.0 6.8 
1935 51.3 14.0 33.2 = (65) 199.0 54.4 6.1 4.4 72 
1936 57.7 17.6 36.8 2a) T977 62.7 7.4 29 8.3 
1937 58.7 173 41.0 2.0 199.8 65.0 6.7 4.3 67 
1938 57.5 15.3 38.2 —1.9 201.8 60.9 7.7 33 7.4 
1939 61.6 19.0 41.6 1.3 1999 69.5 7.8 6.6 8.9 
1940 65.0 21.1 45.0 3.3 201.2 75.7 8.0 7.4 9.6 
1941 69.7 2375 533 4.9 204.5 88.4 8.5 13.8 11.6 


*Interpretation of column heads is listed in Example 18.6. 
Source: These data are taken from G, S. Maddala, Econometrics, McGraw-Hill, New York, 1977, p. 238. 
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Table 20.6* OLS, Reduced-Form and 2SLS Estimates of Klein’s Model I 


OLS: 
Ĉ = 16.237 + 0.193P + 0.796(W + W’) + 0.089P_, R 0978  DWie2i1.367 
(1.203) (0.091) (0.040) (0.090) 
f=10.125 + 0.479P + 0.333P_, — 0.112K_, R? =0.919 DW= 1.810 
(5.465) (0.097) (0.100) (0.026) 
W= 0.064 + 0.439X + 0.146X_, + 0.130t R*° =0.985 DW=1.958 


(1.151) (0.032) (0.037) (0.031) 
Reduced-form: 
P = 46.383 + 0.813P_,— 0.213K_,+ 0.015X_,+ 0.297t— 0.926T+ 0.443G 
(10.870) (0.444) (0.067) (0.252) (0. at (0.385) (0.373) 
— = 0.753 DW = 1.854 
W + W' = 40.278 + 0.823P_;— 0.144K_,+ 0.115X_,+ 0.881t— 0.567T+ 0.859G 
(8.787) (0.359) (0.054) (0.204) (0.124) (0.311) (0.302) 
R? = 0.949 DW=2.395 
Å =78.281 + 1.724P_ı — 0.319K_ı + 0.094X_ı + 0.878t— 0.565T+ 1.317G 
(18.860) (0.771) (0.110) (0.438) (0.267) (0.669) (0.648) 
R° =0.882 DW = 2.049 


2SLS: 
Ĉ = 16.543 + 0.019P + 0.810(;W+ W) + 0.214P_, R? = 0.9726 
(1.464) (0.130) (0.044) (0.118) 
Î = 20.284 + 0.149P + 0.616P_;— 0.157K_, R° = 0.8643 
(8.361) (0.191) (0.180) (0.040) 
W = 0.065 + 0.438X + 0.146X_1 + 0.130t R? = 0.9852 


(1.894) (0.065) (0.070) (0.053) 


epean of vaD is iea in Example 18.6 (standard errors in preie 
Source: G. S. Maddala, Econometrics, McGraw-Hill, New York, 1977, p. 242. 


Example 20.3 The Capital Asset Pricing Model Expressed as a Recursive System 


In a rather unusual application of recursive simultaneous-equation modeling, Cheng F. Lee and W. P. Lloyd’? 
estimated the following model for the oil industry: 


Rit =@ + yiMe+ tnte 
Rat = a2 + BaiRit + y2M: + uz 
R3¢ = æ3 + B31Ri¢ + B32Ror + 3M + Uzt 
Rae = of4 + BarRit + Ba2Ror + Ba3R3t + yaMı + Uae 
Rst = ars + BsiRie + Bs2Rat + Bs3R3t + BsaRar + ysMe + Use 
Ret = a6 + BorRit + Be2Rzt + Pe3R3t + BoaRat + BosRst + YeMt + Uet 


Ryt = 07 + Br Rit + B72R2t + B73R3e + BraRat + BrsRse + BroRee + v7Mi + Ure 


19The Capital Asset Pricing Model Expressed as a Recursive System: An Empirical Investigation,” Journal of Financial and 
Quantitative Analysis, june 1976, pp. 237-249. 
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where R= rate of return on security 1 (= Imperial Oil) 
R, = rate of return on security 2 (= Sun Oil) 


R, = rate of return on security 7 (= Standard of Indiana) 
M,= rate of return on the market index 
Uj, = disturbances (i = 1, 2,..., 7) 

Before we present the results, the obvious question is: How do we choose which is security 1, which is 
security 2, and so on? Lee and Lloyd answer this question purely empirically. They regress the rate of return on 
security jon the rates of return of the remaining six securities and observe the resulting R?. Thus, there will be 
seven such regressions. Then they order the estimated R°? values, from the lowest to the highest. The security 
having the lowest R? is designated as security 1 and the one having the highest R is designated as security 
7. The idea behind this is intuitively simple. If the R? of the rate of return of, say, Imperial Oil, is lowest with 
respect to the other six securities, it would suggest that this security is affected least by the movements in the 
returns of the other securities. Therefore, the causal ordering, if any, runs from this security to the others and 
there is no feedback from the other securities. 

Although one may object to such a purely empirical approach to causal ordering, let us present their 
empirical results nonetheless, which are given in Table 20.7. 


Table 20.7 Recursive System Estimates for the Oil Industry 


Linear Form 


Dependent Variables 
Standard Shell Phillips Union Standard Sun Imperial 
of Indiana Oil Petroleum Oil of Ohio Oil Oil 
Standard 
of Indiana 
Shell Oil 0.2100* 
(2.859) 
Phillips 0.2293* 0.0791 
Petroleum (2.176) (1.065) 
Union Oil 017544 021714022255 
(2472) (3-177) 2 
Standard +0.0794 0.0147 0.4248* 0.1468* {x 
of Ohio E OES (5.501). ZS 
Sun Oil 0.1249 0.1710* 0.0472 0.1339 0.0499 


(1.343) (1.843) (0.355) (0.908) (0.271) 
Imperial Oil -—0.1077 0.0526 0.0354 0.1580 —0.2541* 0.0828 
(—1.412) (0.6804) (0.319) (1.290) (—1.691) (0.971) 
Constant 0.0868 —0.0384 —0.0127 -0.2034 0.3009 0.2013 0.3710* 
(0.681) (1.296) (—0.068) (0.986) (1.204) (1.399) (2.161) 
Market index 0.3681*. 0.4997* 0.2884 0.7609* 0.9089* 0.7161* 0.6432* 
(2.165) (3.039) (1.232) (3.069) (3.094) (4.783) (3.774) 


R? 0.5020 0.4658 0.4106 0.2532 0.0985 0.2404 0.1247 
Durbin- 2.1083 2.4714 2.2306 2.3468 2.2181 2.3109 1.9592 


Watson 


*Denotes significance at 0.10 level or better for two-tailed test. 
Note: The t values appear in parentheses beneath the coefficients. 


Source: Cheng F. Lee and W. P. Lloyd, op. cit., Table 3b. 
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In Exercise 5.5 we introduced the characteristic line of modern investment theory, which is simply the 
regression of the rate of return on security i on the market rate of return. The slope coefficient, known 
as the beta coefficient, is a measure of the volatility of the security’s return. What the Lee—Lloyd regression 
results suggest is that there are significant intra-industry relationships between security returns, apart from the 
common market influence represented by the market portfolio. Thus, Standard of Indiana’s return depends 
not only on the market rate of return but also on the rates of return on Shell Oil, Phillips Petroleum, and Union 
Oil. To put the matter differently, the movement in the rate of return on Standard of Indiana can be better 
explained if in addition to the market rate of return we also consider the rates of return experienced by Shell 
Oil, Phillips Petroleum, and Union Oil. 


Example 20.4 Revised Form of St. Louis Model” 


The well-known, and often controversial, St. Louis model originally developed in the late 1 960s has been 
revised from time to time. One such revision is given in Table 20.8, and the empirical results based on this 
revised model are given in Table 20.9. (Note: A dot over a variable means the growth rate of that variable.) 
The model basically consists of Eqs. (1), (2), (4), and (5) in Table 20.8, the other equations representing the 
definitions. Equation (1) was estimated by OLS. Equations (1), (2), and (4) were estimated using the Almon 
distributed-lag method with (endpoint) constraints on the coefficients. Where relevant, the equations were 
corrected for first-order (p,) and/or second-order (p>) serial correlation. 


Table 20.8 The St. Louis Model 


A 4 P 4 : 
(1) Y¥,;=C1+ © CM (Mts) +Y CEC Eri) +elt 
| 4 5 a 
(2) Pp = C2 + } CPE; (PE) + } CDi(Xt_i— Fein) 
iat , i=0 


+ CPA(PA;) + CDUM1(DUM1) + CDUM2(DUM2) + £2; 


: 21 ; 
(3) PA: = >> CPRL;(Pr_i) 
=), 5G 
(4) © RL; = C3 + D> CPRL; (P i;i) + €3¢ 
i=0 
(5) U: — UF; = CG(GAP,) + CG1(GAP;_}) + 644 
(6) Yı = (P,/100)(X,) 
(7) Ye = [(Y/ Y) — 1]100 
(8) Xt = [(X/X:-)* — 1]100 
(9) P, = [(P:/Pt_)* — 1]100 
(10) GAP; = [(XF;/X;)/XF,]100 
(11) XFf = [(XF,/X-_1)* — 1]100 
Y = nominal GNP XF = potential output (Rasche/Tatom) 
M = money stock (M1) RL = corporate bond rate 
E = high employment expenditures U = unemployment rate 
P = GNP deflator (1972 = 100) UF = unemployment rate at full employment 
PE = relative price of energy DUM1 = control dummy (1971 III to 1973-1 = 1; 0 elsewhere) 
X = output in 1972 dollars DUM2 = postcontrol dummy (1973-II to 1975-1 = 1; 0 elsewhere) 


Source: Federal Reserve Bank of St. Louis, Review, May 1982, p. 14. 


20Federal Reserve Bank of St. Louis, Review, May 1982, p. 14. 
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Examining the results, we observe that it is the rate of growth in the money supply that primarily deter- 
mines the rate of growth of (nominal) GNP and not the rate of growth in high-employment expenditures. 
The sum of the M coefficients is 1.06, suggesting that a 1 percent (sustained) increase in the money supply 
on the average leads to about 1.06 percent increase in the nominal GNP. On the other hand, the sum of the 
E coefficients, about 0.05, suggests that a change in high-employment government expenditure has little 
impact on the rate of growth of nominal GNP. It is left to the reader to interpret the results of the other regres- 


sions reported in Table 20.9. 


Table 20.9 In-Sample Estimation: 1960-I to 1980-IV (absolute value of / statistic in parentheses) 


(1) Y, = 2.44 + 0.40M,-+ 0.39M1 + 0.22M:-2 + 0.06M;_3 — 0.01M:_4 
(2.15) (3.38) (5.06) (2.18) (0.82) (0.11) 
+ 0.06£,+ 0.02E;-; — 0.026: — 0.02£;.3 + 0.01 Fr4 
(1.46) (0.63) (0.57) (0.52) (0.34) 
R2—0.39 se=3.50 DW=2.02 


(2) P, = 0.96 + 0.01PE;7 + 0.04PE;_>— 0.01PE;-3 + 0.02PE,_. 


(2.53) (0.75) (1.96) (0.73) (1.38) 

— 0.00(X;— XFf) + 0.01(X1-1— XF}1) + 0.02(Xt_2— XFŁ 2) 
(0.18) (1.43) (4.63) 

4+ 0.02(X%3— XFf3)+ 0.02(X a XF OONN XE) 
(3.00) (2.42) (2.16) 

+ 1.03(PA;) — 0.61(DUM1,;) + 1.65(DUM2,) 
(10.49) (1.02) (2.71) 


R? = 0.80 se = 1.28 DW = 1297 A OSZ 


(4) RL; = 2.97 + 0.965 Pri 
(3.12) (5.22) 
R2=0.32 se=0.33 DW=1.76 /=0.94 
G U, — Uke 0:28(GAP) + 0.14(GAP,-) 


(11.89) (6.31) 
R? = 0.63 se= O07 sew = 1.95 1 = AS poo 032 


Source: Federal Reserve Bank of St. Louis, Review, May 1982, p. 14. 


Summary and Conclusions 


1. Assuming that an equation in a simultaneous-equation model is identified (either exactly or over-), we 
have several methods to estimate it. 

2. These methods fall into two broad categories: Single-equation methods and systems methods. 

3. For reasons of economy, specification errors, etc., the single-equation methods are by far the most 
popular. A unique feature of these methods is that one can estimate a single-equation in a multiequation 
model without worrying too much about other equations in the system. (Note: For identification 
purposes, however, the other equations in the system count.) 

4. Three commonly used single-equation methods are OLS, ILS, and 2SLS. 
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. Although OLS is, in general, inappropriate in the context of simultaneous-equation models, it can be 
applied to the so-called recursive models where there is a definite but unidirectional cause-and-effect 
relationship among the endogenous variables. 

. The method of ILS is suited for just or exactly identified equations. In this method OLS is applied to 
the reduced-form equation, and it is from the reduced-form coefficients that one estimates the original 
structural coefficients. 

. The method of 2SLS is especially designed for overidentified equations, although it can also be applied 
to exactly identified equations. But then the results of 2SLS and ILS are identical. The basic idea 
behind 2SLS is to replace the (stochastic) endogenous explanatory variable by a linear combination of 
the predetermined variables in the model and use this combination as the explanatory variable in lieu of 
the original endogenous variable. The 2SLS method thus resembles the instrumental variable method 
of estimation in that the linear combination of the predetermined variables serves as an instrument, or 
proxy, for the endogenous regressor. 

. A noteworthy feature of both ILS and 2SLS is that the estimates obtained are consistent, that is, as the 
sample size increases indefinitely, the estimates converge to their true population values. The estimates 
may not satisfy small-sample properties, such as unbiasedness and minimum variance. Therefore, the 
results obtained by applying these methods to small samples and the inferences drawn from them 
should be interpreted with due caution. 


Multiple Choice Questions 


. The method in which each equation in the system of simultaneous equations is estimated individually, 
taking into account any restrictions placed on that equation without worrying about the restrictions 
placed on the other equations in the system is known as 
a. Simultaneous-equation estimation method 
b. Single equation method 
c. Full information method 
d. Two-stage least square method 
. Estimating all the equations in the SEM simultaneously is known as 
a. Simultaneous-equation estimation method 
b. Single equation method 
c. Full information method 
d. Two-stage least square method 
. In which of the following methods of estimation, specification error present in one or more equations 
of the system get transmitted to the rest of the system? 
a. Simultaneous-equation estimation method 
b. Single equation method 
c. Full information method 
d. Two-stage least square method 
. Indirect least squares method and two-stage least squares are examples of 
a. Single equation method 
b. Systems method 
c. Simultaneous-equation method 
d. Full information method 
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In SEMs, OLS can be applied if 
a. Itis a recursive model 
b. Order condition is satisfied 
c. Rank condition is satisfied 
d. Both order and rank conditions are satisfied 
Indirect least squares procedure of estimation is appropriate when simultaneous equations are 
a. Over-identified 
b. Under-identified 
c. Exactly identified 
d. Identified 
In estimating SEM by indirect least squares method, 
a. GLS is applied to the reduced for equation 
b. GLS is applied to the structural equation 
c. OLS is applied to the reduced for equation 
d. OLS is applied to the structural equation 
The ISL estimators are 
a. Unbiased in small samples 
b. Biased in large samples 
c. Asymptotically efficient 
d. BLUE in small samples and large samples 
Which of the following procedures is most appropriate for estimation of over-identified SEM? 
a. OLS 
b. ILS 
c. 2SLS 
d. GLS 
In estimating SEM, we find proxy variables for the stochastic explanatory variables. These proxy 
variables are known as 
a. Coefficients of reduced form 
b. Parameters of structural form 
c. Instrumental variables 
d. Explanatory variables bs 
The method developed by Henri Theil and Robert Basmann in estimating SEM by using proxy explan- 
atory variables is known as the 
a. OLS method 
b. ILS method 
c. 2SLS method 
d. MLE method 
The classical OLS estimates and 2SLS estimates of SEM will be very close to each other if in the 
reduced-form regressions 
a. The standard errors of the estimates are small 
b. All the coefficients are statistically significant 
c. R? value is very high 
d. Coefficient of correlation is low between the explanatory variables 
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13. In 2SLS procedure in the second stage the Y’s are good proxies of the original Y values if in the first- 
stage regression, 
a. Standard errors of the estimates are small 
b. All the coefficients are statistically significant 
c. R? value is very high 
d. Coefficient of correlation is low between the explanatory variables 
14. Under 2SLS estimation technique, the significance of an individual coefficient is tested using 
a. Students t test 
b. F-test 
c. Chi-square test 
d. Modified t-test 
15. Under 2SLS estimation technique, the joint significance of two or more coefficients is tested using 
a. Students t-test 
b. F-test 
c. Chi-square test 
d. Modified F-test 


Exercises 


Questions 


20.1. State whether each of the following statements is true or false: 
a. The method of OLS is not applicable to estimate a structural equation in a simultaneous-equation 
model. 
. In case an equation is not identified, 2SLS is not applicable. 
. The problem of simultaneity does not arise in a recursive simultaneous-equation model. 
. The problems of simultaneity and exogeneity mean the same thing. 
. The 2SLS and other methods of estimating structural equations have desirable statistical properties 
only in large samples. 
f. There is no such thing as an R? for the simultaneous-equation model as a whole. 
«g. The 2SLS and other methods of estimating structural equations are not applicable if the equation 
errors are autocorrelated and/or are correlated across equations. 
h. If an equation is exactly identified, ILS and 2SLS give identical results. 
20.2. Why is it unnecessary to apply the two-stage least-squares method to exactly identified equations? 
20.3. Consider the following modified Keynesian model of income determination: 


C; = Big Pitta eu 
I, = Boo + Bai Y; + Bo2Yi-1 + Uz 
Y, = Crt ire G, 


ska > 


where C = consumption expenditure 
I = investment expenditure 
Y = income 


G = government expenditure 
G,and Y, are assumed predetermined 


Optional. 
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20.4. 


790.5. 


a. Obtain the reduced-form equations and determine which of the preceding equations are identified 
(either just or over-). 

b. Which method will you use to estimate the parameters of the overidentified equation and of the 
exactly identified equation? Justify your answer. 

Consider the following results:” 


OLS: W, = 0.276 + 0.258P, + 0.046P;_; + 4.959V, R? = 0.924 
OLS: P, = 2.693 + 0.232 W, — 0.544X, + 0.247M, + 0.064M,_, R? = 0.982 
2SLS: W, = 0.272 +'0.257P, + 0.046P,_; + 4.966V, R? = 0.920 


2SLS: P, = 2.686 + 0.233 W, — 0.544X, + 0.246M, + 0.046M,_, R? = 0.981 


where W,, P,, M;, and X, are percentage changes in earnings, prices, import prices, and labor 
productivity (all percentage changes are over the previous year), respectively, and where V, represents 
unfilled job vacancies (percentage of total number of employees). 

“Since the OLS and 2SLS results are practically identical, 2SLS is meaningless.” Comment. 

Assume that production is characterized by the Cobb-Douglas production function 


Q= AKEL? 


where Q = output 
K = capital input 
L = labor input 
A, a, and B = parameters 
i = ith firm - : 
Given the price of final output P, the price of labor W, and the price of capital R, and assuming profit 
maximization, we obtain the following empirical model of production: 


Production function: 

in Q; =nA+aink; + ln L; + Inu; (1) 
Marginal product of labor function: 

In Qj =—InB +n L; +n É + Inu, “ eG 
Marginal product of capital function: 

in Q; = — Ina + In K; + In 5 + Inus l (3) 


where u4, u, and u, are stochastic disturbances. 
In the preceding model there are three equations in three endogenous variables Q, L, and K. P R, 
and W are exogenous. . 
a. What problems do you encounter in estimating the model if a + B = 1, that is, when there are 
constant returns to scale? 


b. Even if a+ B # 1, can you estimate the equations? Answer by considering the identifiability of the 
system. 


"Source: Prices and Earnings in 1951-1969: An Econometric Assessment, Department of Employment, United Kingdom, Her 
Majesty’s Stationery Office, London, 1971, p. 30. ` 


TOptional. 
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c. If the system is not identified, what can be done to make it identifiable? 
Note: Equations (2) and (3) are obtained by differentiating Q with respect to labor and capital, respec- 
tively, setting them equal to W/P and R/P, transforming the resulting expressions into logarithms, and 
adding (the logarithm of) the disturbance terms. 

20.6. Consider the following demand-and-supply model for money: 


Demand for money: M? = fo + iYi + BR: + BP, + uy 
Supply of money: M? = do +0 Y; + ux 


where M = money 


Y = income 
R = rate of interest 
P = price 


Assume that R and P are predetermined.” 

. Is the demand function identified? 

. Is the supply function identified? 

. Which method would you use to estimate the parameters of the identified equation(s)? Why? 

. Suppose we modify the supply function by adding the explanatory variables Y,_, and M,_, What 
happens to the identification problem? Would you still use the method you used in (c)? Why or 
why not? 

20.7. Refer to Exercise 18.10. For the two-equation system there obtain the reduced-form equations and 

estimate their parameters. Estimate the indirect least-squares regression of consumption on income 
and compare your results with the OLS regression. 


Vo Sa 


Empirical Exercises 
20.8. Consider the following model: 


R, = Bo + BiM: + 2Y; + ur 
Y, = do + Qı R; + úx 
where M, (money supply) is exogenous, R, is the interest rate, and Y, is GDP. 
a. How would you justify the model? 
b. Are the equations identified? 
c. Using the data given in Table 20.2, estimate the parameters of the identified equations. Justify the 
method(s) you use. 
20.9. Suppose we change the model in Exercise 20.8 as follows: 


R, = Bo + BiM; + Bok: + B3Yi-1 + uir 
Y, = do +0 R: + ur; 


a. Find out if the system is identified. 
b. Using the data given in Table 20.2, estimate the parameters of the identified equation(s). 
20.10. Consider the following model: 


R; = Bo + BM, + Bok; + ui 
Y, = a +R; + ah; + Ur, 
where the variables are as defined in Exercise 20.8. Treating / (domestic investment) and M exoge- 


nously, determine the identification of the system. Using the data given in Table 20.2, estimate the 
parameters of the identified equation(s). 
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20.11. 


20.12. 
201167 


20.14. 


Suppose we change the model of Exercise 20.10 as follows: 


R; = Bo + BiM; + B2Y; + uit 
Y; = ao +R; a + ur, 
I= yo + yı Ri + us: 


Assume that M is determined exogenously. 

a. Find out which of the equations are identified. 

b. Estimate the parameters of the identified equation(s) using the data given in Table 20.2. Justify 
your method(s). 

Verify the standard errors reported in Eq. (20.5.3). 

Return to the demand-and-supply model given in Eqs. (20.3.1) and (20.3.2). Suppose the supply 

function is altered as follows: 


Q; = Bo + Bi Pii + U2; 


where P,_, is the price prevailing in the previous period. 

a. If X (expenditure) and P,_, are predetermined, is there a simultaneity problem? 

b. If there is, are the demand and supply functions each identified? If they are, obtain their reduced- 
form equations and estimate them from the data given in Table 20.1. 

c. From the reduced-form coefficients, can you derive the structural coefficients? Show the necessary 
computations. 

Class Exercise: Consider the following simple macroeconomic model for the U.S. economy, say, for 

the period 1960-1999." 


Private consumption function: 
C; = do +0 Y; +arC;_) + uy; of 200 ae 
Private gross investment function: 
L = Po + Bil; + B2R: + Bsh-1+uy, Bı > 0, B < 0,0 < fp <1 
A money demand function: 
Ry = ào +A, +AaMi-1 + Ag Pp +AgRi-1 tus, Ay > 0,A2 < 0,43 > 0,0 < Ay <1 


wv 


Income identity: 
Y,=C,;+4,+G, 


where C = real private consumption; J = real gross private investment, G = real government 

expenditure, Y = real GDP, M = M2 money supply at current prices, R = long-term interest rate (%), 

and P = Consumer Price Index. The endogenous variables are C, J, R, and Y. The predetermined 

variables are: C,_,, J;.,,M,_, Pp R,_,, and G, plus the intercept term. The ws are the error terms. 

a. Using the order condition of identification, determine which of the four equations are identified, 
either exact or over-. 

b. Which method(s) do you use to estimate the identified equations? 

c. Obtain suitable data from government and/or private sources, estimate the model, and comment on 
your results, 


*Adapted from H. R. Seddighi, K. A. Lawler, and A. V. Katos, Econometrics: A Practical Approach, Routledge, New York, 2000, 


p. 204. 


20315: 


20.16. 
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In this exercise we examine data for 534 workers obtained from the Current Population Survey (CPS) 
for 1985. The data can be found as Table 20.10 on the textbook website.” The variables in this table 
are defined as follows: 

W = wages $, per hour; occup = occupation; sector = 1 for manufacturing, 2 for construction, 0 for 
other; union = | if union member, 0 otherwise; educ = years of schooling; exper = work experience 
in years; age = age in years; sex = | for female; marital status = 1 if married: race = 1 for other, 2 for 
Hispanic, 3 for white; region = 1 if lives in the South. 

Consider the following simple wage determination model: 


In W = B, + B.Educ + B3Exper + ByExper” + u; (1) 


a. Suppose education, like wages, is endogenous. How would you find out that in Equation (1) 
education is in fact endogenous? Use the data given in the table in your analysis. 

b. Does the Hausman test support your analysis in (a)? Explain fully. 

Class Exercise: Consider the following demand-and-supply model for loans of commercial banks to 

businesses: 


Demand: Q? = a, + aR, + a@2RD, + agIPI, + uy, 
Supply: Q; = Pı + B2R: + BsRS, + BsTBD, + uz 


where Q = total commercial bank loans ($billion); R = average prime rate; RS = 3-month Treasury 

bill rate, RD = AAA corporate bond rate; IPI = Index of Industrial Production; and TBD = total bank 

deposits. 

a. Collect data on these variables for the period 1980-2007 from various sources, such as www. 
economagic.com, the website of the Federal Reserve Bank of St. Louis, or any other source. 

b. Are the demand and supply functions identified? List which variables are endogenous and which 
are exogenous. 

c. How would you go about estimating the demand and supply functions listed above? Show the 
necessary calculations. 

d. Why are both R and RS included in the model? What is the role of IPI in the model? 


*Data can be found on the Web, at http://lib.stat.cmu.edu/datasets/cps_85_wages. 


Key to Multiple Choice Questions 


1. (b) 2. (c) 3 (C) 4. (a) 5. (a) 6. (c) 7. (c) 8. (c) 9. (c) 
10. (c) MERC) 12. (c) 13. (c) 14. (a) 15. (b) 


Appendix 20A 


20A.I Bias in the Indirect Least-Squares Estimators 


To show that the ILS estimators, although consistent, are biased, we use the demand-and-supply model given in Eqs. 
(20.3.1) and (20.3.2). From Eq. (20.3.10) we obtain 
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Now 


fig ete! tromBqy(203%7) 
a: 


and 


D. 
Therefore, on substitution, we obtain 
a x 
peva (i) 
Dd pixi 
Using Egs. (20.3.3) and (20.3.4), we obtain 
Pr = Thx; + (w; — w) (2) 
qi = T13x; + (v: — Y) (3) 
where w and ¥ are the mean values of w,and v,, respectively. 
Substituting Eqs. (2) and (3) into Eq. (1), we obtain 
We Is Sox? + È (v: — Vx; 
Th Sox? + Dw, — Wx, 
(4) 


= Ts + D (vene oa. 
M + E(w: — w)x/ ox? 
Since the expectation operator E is a linear operator, we cannot take the expectation of Eq. (4), although it is clear that 


Êi + (113/T;) generally. (Why?) 
But as the sample size tends to infinity, we can obtain 


m lim M li » — px ee! 
ENE plim M + plim $ (v; — ¥)x,/ È x: 


plim M; + plim Jo(w: — W)x;/ 0x? = 
where use is made of the properties of plim, namely, that 2 
plim(A + B) = plim A + plim B and plim (5) i 
B plim B 


Now as the sample size is increased indefinitely, the second term in both the denominator and the numerator of Eq. 
(5) tends to zero (why?), yielding 


‘ r pi 
plim (A) = a i (6) 


showing that, although biased, Ê; is a consistent estimator of Bı 


20A.2 Estimation of Standard Errors of 2SLS Estimators 


The purpose of this appendix is to show that the standard errors of the estimates obtained from the second-page regression 
of the 2SLS procedure, using the formula applicable in OLS estimation, are not the “proper” estimates of the “true” 
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standard errors. To see this, we use the income—money supply model given in Eqs. (20.4.1) and (20.4.2). We estimate the 
parameters of the overidentified money supply function from the second-stage regression as 


Yor = Boo + Bor Yar + uF (20.4.6) 
where 
u; = Uz + Bort; (7) 
Now when we run regression (20.4.6), the standard error of, say. >, is obtained from the following expression: 


A2 


a oO * 
var (Âz) = = 8) 
Vit 
where 
Ae \2 A ts ea ND 
gue Leary” _ (Yz — Boo ~ Bar Yir) (9) 
n—2 n—-2 


But ø, is not the same thing as oa , where the latter is an unbiased estimate of the true variance of u,. This difference 
can be readily verified from Eq. (7). To obtain the true (as defined previously) 67, we proceed as follows: 


u2? 
Uy = Ya, — Boo — Pz Yir 


where Boo and Boy are the estimates from the second-stage regression. Hence, 


22 _ (Yr — Boo — Bar Yur)? 

a ” 
Note the difference between Eqs. (9) and (10): In Eq. (10) we use actual Y, rather than the estimated Y, from the first-stage 
regression. 

Having estimated Eq. (10), the easiest way to correct the standard errors of coefficients estimated in the second- 
stage regression is to multiply each one of them by 6,,/¢,. Note that if Y,, and Yi, are very close, that is, the R in the 
first-stage regression is very high, the correction factor 6,,,/6,* will be close to 1, in which case the estimated standard 
errors in the second-stage regression may be taken as the true estimates. But in other situations, we shall have to use the 
preceding correction factor. 


CHAPTER 


Time Series Econometrics: 
Some Basic Concepts 


We noted in Chapter 1 that one of the important types of data used in empirical analysis is time series data. 
In this and the following chapter we take a closer look at such data not only because of the frequency 
with which they are used in practice but also because they pose several challenges to econometricians and 
practitioners. 

First, empirical work based on time series data assumes that the underlying time series is stationary. 
Although we have discussed the concept of stationarity intuitively in Chapter 1, we discuss it more fully in this 
chapter. More specifically, we will try to find out what stationarity means and why one should worry about it. 

Second, in Chapter 12, on autocorrelation, we discussed several causes of autocorrelation. Sometimes 
autocorrelation results because the underlying time series is nonstationary. 

Third, in regressing a time series variable on another time series variable(s), one often obtains a very high 
R? (in excess of 0.9) even though there is no meaningful relationship between the two variables. Sometimes 
we expect no relationship between two variables, yet a regression of one on the other variable often shows a 
significant relationship. This situation exemplifies the problem of spurious, or nonsense,regression, whose 
nature will be explored shortly. It is therefore very important to find out if the relationship between economic 
variables is spurious or nonsensical. We will see in this chapter how spurious regressions can arise if time 
series are not stationary. 

Fourth, some financial time series, such as stock prices, exhibit what is known as the random walk 
phenomenon. This means the best prediction of the price of a stock, say IBM, tomorrow is equal to its price 
today plus a purely random shock (or error term). If this were in fact the case, forecasting asset prices would 
be a futile exercise. l 

Fifth, regression models involving time series data are often used for forecasting. In view of the 
preceding discussion, we would like to know if such forecasting is valid if the underlying time series are not 
stationary. 

Finally, causality tests (recall the Granger and Sims causality tests discussed in Chapter 17) assume 


that the time series involved in analysis are stationary. Therefore, tests of stationarity should precede tests of 
causality. 
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At the outset a disclaimer is in order. The topic of time series analysis is so vast and evolving and some 
of the mathematics underlying the various techniques of time series analysis is so involved that the best we 
hope to achieve in an introductory text like this is to give the reader a glimpse of some of the fundamental 
concepts of time series analysis. For those who want to pursue this topic further, we provide references. ! 


21.1 A Look at Selected U.S. Economic Time Series 


To set the ball rolling, and to give the reader a feel for the somewhat esoteric concepts of time series analysis to 
be developed in this chapter, it might be useful to consider several U.S. economic time series of general interest. 
The time series we consider are: 


DPI = real disposable personal income (billions of dollars) 
GDP = gross domestic product (billions of dollars) 
PCE = real personal consumption expenditure (billions of dollars) 
CP = corporate profits (billions of dollars) 
Dividend = dividends, (billions of dollars) 


The time period covered is from 1947-I to 2007-IV, for a total of 244 quarters, and all data are seasonally 
adjusted at the annual rate. All the data are collected from FRED, the economic website of the Federal Reserve 
Bank of St. Louis. GDP, DPI, and PCE are in constant dollars, here 2000 dollars. CP and Dividend are in nominal 
dollars. 

To save space, the raw data are posted on the book’s website. But to get some idea of these data, we have 
plotted them in the following two figures. Figure 21.1 is a plot of the data of logarithms of GDP, DPI, and PCE 
and Figure 21.2 presents the logs of the other two time series (CP and Dividend). It is common practice to plot 
the log of a time series to get a glimpse of the growth rate of such a series. A visual plot of the data is usually the 
first step in the analysis of time series. In these figures the letter L denotes the natural logarithm. 

The first impression we get from these two figures 1s that all these time series seem to be “trending” upward, 
albeit with fluctuations. Suppose we want to speculate on the shape of these curves beyond the sample 
period, say for all the quarters of 2008.” We can do that if we know the statistical, or stochastic, mechanism, 
or the data generating process (DGP) that generated these curves. But what is that mechanism? To answer 
this and related questions, we need to study some “new” vocabulary that has been developed by time series 
analysts, to which we now turn. 


‘At the introductory level, these references may be helpful: Gary Koop, Analysis of Economic Data, John Wiley & Sons, 
New York, 2000; Jeff B. Cromwell, Walter C. Labys, and Michel Terraza, Univariate Tests for Time Series Models, Sage 
Publications, California, Ansbury Park, 1994; Jeff B. Cromwell, Michael H. Hannan, Walter C. Labys, and Michel Ter- 
raza, Multivariate Tests for Time Series Models, Sage Publications, California, Ansbury Park, 1994; and H. R. Seddighi, 
K. A. Lawler, and A. V. Katos, Econometrics: A Practical Approach, Routledge, New York, 2000. At the intermediate level, 
see Walter Enders, Applied Econometric Time Series, john Wiley & Sons, New York, 1995; Kerry Patterson, An Introduction 
to Applied Econometrics: A Time Series Approach, St. Martin's Press, New York, 2000; T. C. Mills, The Econometric Model- 
ling of Financial Time Series, 2d ed., Cambridge University Press, New York, 1999; Marno Verbeek, A Guide to Modern 
Econometrics, John Wiley & Sons, New York, 2000; and Wojciech W. Charemza and Derek F. Deadman, New Directions 
in Econometric Practice: General to Specific Modelling and Vector Autoregression, 2d ed., Edward Elgar Publisher, New York, 
1997. At the advanced level, see J. D. Hamilton, Time Series Analysis, Princeton University Press, Princeton, NJ, 1994, 
and G. S. Maddala and In-Moo Kim, Unit Roots, Cointegration, and Structural Change, Cambridge University Press, 
1998. At the applied level, see B. Bhaskara Rao, ed., Cointegration for the Applied Economist, St. Martin’s Press, New York, 
1994, and Chandan Mukherjee, Howard White, and Marc Wuyts, Econometrics and Data Analysis for Developing Countries, 
Routledge, New York, 1998. 

2Of course, we have the actual data for this period now and could compare it with the data that is “predicted” on the 


basis of the earlier period. 
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Figure 21.1 Logarithms of real GDP, DPI, and PCE, United States, 1947—2007 (quarterly, $ billions). 
Note. In the figure the letter L denotes natural logarithm. 
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Figure 21.2 Logarithms of corporate profits (CP) and dividends, United States, 1947—2007 (quarterly, $ billions). 
Note: L denotes logarithm. 


21.2 Key Concepts? 


What is this vocabulary? It consists of concepts such as these: 


1. Stochastic processes 
2. Stationarity processes 
3. Purely random processes 


3The following discussion is based on Maddala et al., op. cit., Charemza et al., op. cit., and Carol Alexander, Market 
Models; A Guide to Financial Data Analysis, John Wiley & Sons, New York, 2001. 
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Nonstationary processes 
Integrated variables 

Random walk models 
Cointegration 

Deterministic and stochastic trends 
Unit root tests 


oe Se > 


In what follows we will discuss each of these concepts. Our discussion will often be heuristic. Wherever 
possible and helpful, we will provide appropriate examples. 


21.3 Stochastic Processes 


A random or stochastic process is a collection of random variables ordered in time.’ If we let Y denote a 
random variable, and if it is continuous, we denote it as Y(t), but if it is discrete, we denoted it as Y, An 
example of the former is an electrocardiogram, and an example of the latter is GDP, DPI, etc. Since most 
economic data are collected at discrete points in time, for our purpose we will use the notation Y, rather than 
Y(t). If we let Y represent GDP, for our data we have Y,, Y>, Y3, ---, Yo42» Yx43, Yoaq, where the subscript 
1 denotes the first observation (i.e., GDP for the first quarter of 1947) and the subscript 244 denotes the 
last observation (i.e., GDP for the fourth quarter of 2007). Keep in mind that each of these Y’s is a random 
variable. 

In what sense can we regard GDP as a stochastic process? Consider for instance the real GDP of 
$3,759.997 billion for 1970-I. In theory, the GDP figure for the first quarter of 1970 could have been 
any number, depending on the economic and political climate then prevailing. The figure of 3,759.997 is a 
particular realization of all such possibilities.” Therefore, we can say that GDP is a stochastic process and 
the actual values we observed for the period 1947-I to 2007-IV are particular realizations of that process 
(i.e., sample). The distinction between the stochastic process and its realization is akin to the distinction 
between population and sample in cross-sectional data. Just as we use sample data to draw inferences about a 
population, in time series we use the realization to draw inferences about the underlying stochastic process. 


Stationary Stochastic Processes 


A type of stochastic process that has received a great deal of attention and scrutiny by time series analysts is 
the so-called stationary stochastic process. Broadly speaking, a stochastic process is said to be stationary 
if its mean and variance are constant over time and the value of the covariance between the two time periods 
depends only on the distance or gap or lag between the two time periods and not the actual time at which the 
covariance is computed. In the time series literature, such a stochastic process is known as a weakly stationary, 
or covariance stationary, or second-order stationary, or wide sense, stochastic process. For the purpose 
of this chapter, and in most practical situations, this type of stationarity often suffices.° 


4The term “stochastic” comes from the Greek word “stokhos,” which means a target or bull’s-eye. If you have ever 
thrown darts on a dart board with the aim of hitting the bull’s-eye, how often did you hit the bull’s-eye? Out of a 
hundred darts you may be lucky to hit the bull’s-eye only a few times; at other times the darts will be spread randomly 
around the bull’s-eye. 

5You can think of the value of $3,759.997 billion as the mean value of all possible values of GDP for the first quarter of 1970. 
6A time series is strictly stationary if all the moments of its probability distribution and not just the first two (i.e., mean 
and variance) are invariant over time. If, however, the stationary process is normal, the weakly stationary stochastic 
process is also strictly stationary, for the normal stochastic process is fully specified by its two moments, the mean and 
the variance. 
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To explain weak stationarity, let Y, be a stochastic time series with these properties: 


Mean: BY) SA (21.3.1) 
Variance: var (Y;) = E(Y, — y) = a? (21.3.2) 
Covariance: yk = ERY, — XY: — 4] ; (21.3.3) 


where y, the covariance (or autocovariance) at lag k, is the covariance between the values of Y, and Y,,;, that 
is, between two Y values k periods apart. If k = 0, we obtain Yọ, which is simply the variance of Y( = oa”); if k 
= 1, y, is the covariance between two adjacent values of Y, the type of covariance we encountered in Chapter 
12 (recall the Markov first-order autoregressive scheme). 

Suppose we shift the origin of Y from Y, to Y,,,, (say, from the first quarter of 1947 to the first quarter of 
1952 for our GDP data). Now if Y, is to be stationary, the mean, variance, and autocovariances of Y,,,,, must be 
the same as those of Y,. In short, if a time series is stationary, its mean, variance, and autocovariance (at 
various lags) remain the same no matter at what point we measure them; that is, they are time invariant. 
Such a time series will tend to return to its mean (called mean reversion) and fluctuations around this mean 
(measured by its variance) will have a broadly constant amplitude.’ To put it differently, a stationary process 
will not drift too far away from its mean value because of the finite variance. As we shall see shortly, this is not 
the case with nonstationary stochastic processes. It should be noted that for a stationary process the speed of 
mean reversion depends on the autocovariances; it is quick if the autocovariances are small and slow when 
they are large, as we will show shortly. 

If a time series is not stationary in the sense just defined, it is called a nonstationary time series (keep in 
mind we are talking only about weak stationarity). In other words, a nonstationary time series will have a time- 
varying mean or a time-varying variance or both. 

Why are stationary time series so important? Because if a time series is nonstationary, we can study its 
behavior only for the time period under consideration. Each set of time series data will therefore be for a particular 
episode. As a consequence, it is not possible to generalize it to other time periods. Therefore, for the purpose of 
forecasting, such (nonstationary) time series may be of little practical value. 

How do we know that a particular time series is stationary? In particular, are the time series shown in Figures 
21.1 and 21.2 stationary? We will take this important topic up in Sections 21.8 and 21.9, where we will consider 
several tests of stationarity. But if we depend on common sense, it would seem that the time series depicted in 
_ Figures 21.1 and 21.2 are nonstationary, at least in the mean values. But more on this later. 

Before we move on, we mention a special type of stochastic process (or time series), namely, a purely 
random, or white noise, process. We call a stochastic process purely random if it has zero mean, constant 
variance g°, and is serially uncorrelated. You may recall that the error term u, entering the classical normal 
linear regression model that we discussed in Part 1 of this book, was assumed to be a white noise process, 
which we denoted as u, ~ IDN(O, a”); that is, u, is independently and identically distributed as a normal distri- 
bution with zero mean and constant variance. Such a process is called a Gaussian white noise process. 


Nonstationary Stochastic Processes 


Although our interest is in stationary time series, one often encounters nonstationary time series, the classic 
example being the random walk model (RWM).’ It is often said that asset prices, such as stock prices or 


’This point has been made by Keith Cuthbertson, Stephen G. Hall, and Mark P. Taylor, Applied Econometric Techniques, The 
University of Michigan Press, 1995, p. 1 30. 

Sif it is also independent, such a process is called strictly white noise. 

°The term random walk is often compared with a drunkard’s walk. Leaving a bar, the drunkard moves a random 
distance u, at time t, and, continuing to walk indefinitely, will eventually drift farther and farther away from the bar. 
The same is said about stock prices. Today's stock price is equal to yesterday’s stock price plus a random shock. 
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exchange rates, follow a random walk; that is, they are non- stationary. We distinguish two types of random walks: 
(1) random walk without drift (i.e.. no constant or intercept term) and (2) random walk with drift (i.e., a 
constant term is present). 


Random Walk without Drift 


Suppose u, is a white noise error term with mean 0 and variance o°. Then the series Y, is said to be a 
random walk if 
Y= Y,a tur (21.3.4) 


In the random walk model, as Eq. (21.3.4) shows, the value of Y at time f is equal to its value at time (t — 1) 
plus a random shock; thus it is an AR(1) model in the language of Chapters 12 and 17. We can think of Eq. 
(21.3.4) as a regression of Y at time ron its value lagged one period. Believers in the efficient capital market 
hypothesis argue that stock prices are essentially random and therefore there is no scope for profitable 
speculation in the stock market: If one could predict tomorrow’s price on the basis of today’s price, we 
would all be millionaires. 

Now from Eq. (21.3.4) we can write 


Yı = Y + u; 
Yo = Yı +u = Yọ + uj +w 
Y; = Yı + uz = Yo + u; + u2 + u3 


In general, if the process started at some time 0 with a value of Yọ, we have 


Y=Y+) u (21.3.5) 
Therefore, 
BESE (%o ™> ur) = Yo (why?) (21.3.6) 
In like fashion, it can be shown that 
var (Y) = to? (21.3.7) 


As the preceding expression shows, the mean of Y is equal to its initial, or starting, value, which is constant, 
but as f increases, its variance increases indefinitely, thus violating a condition of stationarity. In short, the 
RWM without drift is a nonstationary stochastic process. In practice Yọ is often set at zero, in which case 
E(Y,) = 0. 

An interesting feature of the RWM is the persistence of random shocks (i.e., random errors), which is clear 
from Eq. (21.3.5): Y, is the sum of initial Yọ} plus the sum of random shocks. As a result, the impact of a 
particular shock does not die away. For example, if u, = 2 rather than u, = 0, then all Y,’s from Y, onward 
will be 2 units higher and the effect of this shock never dies out. That is why random walk is said to have an 
infinite memory. As Kerry Patterson notes, random walk remembers the shock forever;'® that is, it has infinite 
memory. The sum ` u, is also known as a stochastic trend, about which more will be said shortly. 

Interestingly, if you write Eq. (21.3.4) as 

(Y, — %-1) = AY, = u (21.3.8) 
where A is the first difference operator that we discussed in Chapter 12, it is easy to show that, while Y, is 
nonstationary, its first difference is stationary. In other words, the first differences of a random walk time series 
are stationary. But we will have more to say about this later. 


‘Kerry Patterson, op cit., Chapter 6. 
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Random Walk with Drift 
Let us modify Eq. (21.3.4) as follows: 

Y Ste ay (21.3.9) 
where 6 is known as the drift parameter. The name drift comes from the fact that if we write the preceding 


equation as 

Y, — Ya AY 0 a (21.3.10) 
it shows that Y, drifts upward or downward, depending on 6 being positive or negative. Note that model (21.3.9) 
is also an AR(1) model. 


Following the procedure discussed for random walk without drift, it can be shown that for the random 
walk with drift model (21.3.9), 


E(Y,)=Yo+t-8 21381) 
var (Y;) = to? (21.3.12) 


As you can see, for RWM with drift the mean as well as the variance increases over time, again violating the 
conditions of (weak) stationarity. In short, RWM, with or without drift, is a nonstationary stochastic process. 
To give a glimpse of the random walk with and without drift, we conducted two simulations as follows: 


Y; = Yo+ uy (21.3.13) 


where u, are white noise error terms such that each u, ~ N(O, 1); that is, each u, follows the standard 
normal distribution. From a random number generator, we obtained 500 values of u and generated Y, as 
shown in Eq. (21.3.13). We assumed Yọ = 0. Thus, Eq. (21.3.13) is an RWM without drift. 
Now consider l l 
Y, =+ Yo +u; (21.3.14) 


which is RWM with drift. We assumed u, and Yo as in Eq. (21.3.13) and assumed that 6 = 2. 
The graphs of models (21.3.13) and (21.3.14), respectively, are in Figures 21.3 and 21.4. The reader can compare 
these two diagrams in light of our discussion of the RWM with and without drift. 
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Figure 21.3 A random walk without drift. 
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Figure 21.4 A random walk with drift. 


The random walk model is an example of what is known in the literature as a unit root process. 
Since this term has gained tremendous currency in the time series literature, we next explain what 
a unit root process is. 


21.4 Unit Root Stochastic Process 


Let us write the RWM (21.3.4) as: 
Y, = pY + u: -l <p<1 (21.4.1) 


This model resembles the Markov first-order autoregressive model that we discussed in the chapter 
on autocorrelation. If p = 1, Eq. (21.4.1) becomes a RWM (without drift). If p is in fact 1, we face 
what is known as the unit root problem, that is, a situation of nonstationarity; we already know 
that in this case the variance of Y, is not stationary. The name unit root is due to the fact that p = 
1.!! Thus the terms nonstationarity, random walk, unit root, and stochastic trend can be treated 
synonymously. 

If, however, lpl < 1, that is if the absolute value of p is less than one, then it can be shown that the 
time series Y, is stationary in the sense we have defined it.” 

In practice, then, it is important to find out if a time series possesses a unit root.!? In Section 21.9 
we will discuss several tests of unit root, that is, several tests of stationarity. In that section we will 
also determine whether the time series depicted in Figures 21.1 and 21.2 are stationary. Perhaps the 
reader might suspect that they are not. But we shall see. 


"Aa technical point: If p = 1, we can write Eq. (21.4.1) as Y, - Y, = ú+ Now using the lag operator | so that LY, = 
Yea LY, = Yez and so on, we can write Eq. (21.4.1) as (1 - L)Y, = u,. The term unit root refers to the root of the poly- 
nomial in the lag operator. If you set (1 — L) = 0, we obtain, L = 1, hence the name unit root. 

'2if in Eq. (21.4.1) it is assumed that the initial value of Y(= Yo) is zero, |p| < 1, and u, is white noise and distributed 
normally with zero mean and unit variance, then it follows that E(Y,)= 0 and var (Y,) = 1⁄1 - p°). Since both these are 
constants, by the definition of weak stationarity, Y, is stationary. On the other hand, as we saw before, if p = 1, Y, is a 
random walk or nonstationary. 

134 time series may contain more than one unit root. But we will discuss this situation later in the chapter. 
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21.5 Trend Stationary (TS) and Difference Stationary (DS) 
Stochastic Processes 


The distinction between stationary and nonstationary stochastic processes (or time series) has a 
crucial bearing on whether the trend (the slow long-run evolution of the time series under consid- 
eration) observed in the constructed time series in Figures 21.3 and 21.4 or in the actual economic 
time series of Figures 21.1 and 21.2 is deterministic or stochastic. Broadly speaking, if the trend in 
a time series is a deterministic function of time, such as time, time-squared etc., we call it a deter- 
ministic trend, whereas if it is not predictable, we call it a stochastic trend. To make the definition 
more formal, consider the following model of the time series Y,. 


Y, = By ae Bat a Bs Yi +u (21.5.1) 
where u, is a white noise error term and where f is time measured chronologically. Now we have the following 
possibilities: 

Pure random walk: If in Eq. (21.5.1) 8, = 0, B, = 0, B, = 1, we get 
im a tee (21.5.2) 


which is nothing but a RWM without drift and is therefore nonstationary. But note that, if we write Eq. 
(21.5.2) as 


AY,=UOPF— J) (21.3.8) 


it becomes stationary, as noted before. Hence, a RWM without drift is a difference stationary 
process (DSP). 
Random walk with drift: If in Eq. (21.5.1) B, # 0, B» =0, B; = 1, we get 


Y, = Pı + Yi tu, =. (21.5.3) 


which is a random walk with drift and is therefore nonstationary. If we write it as 
(% — Y%-1) = AY, = Bi +u; (21.5.3a) 


this means Y, will exhibit a positive (8, > 0) or negative (8, < 0) trend (see Figure 21.4). Such a trend 
is called a stochastic trend. Equation (21.5.3a) is a DSP process because the nonstationarity in Y, can be 
eliminated by taking first differences of the time series. Remember that u, in Eq. (21.5.3a) is a white noise 
error term. 

Deterministic trend: If in Eq. (21.5.1), B, # 0. B» # 0, B, # 0, we obtain 


Y, = Bi + Bot + uy (21.5.4) 
which is called a trend stationary process (TSP). Although the mean of Y, is B, + B-t, which is not 
constant, its variance (= a7) is. Once the values of B, and B, are known, the mean can be forecast 
perfectly. Therefore, if we subtract the mean of Y, from Y,, the resulting series will be stationary, hence the 
name trend stationary. This procedure of removing the (deterministic) trend is called detrending. 

Random walk with drift and deterministic trend: If in Eq. (21.5.1), 6, # 0, B, =0, B, = 1, we obtain: 


Y, = Bi + pat t Yra Hu; (21.5.5) 


in which case we have a random walk with drift and a deterministic trend, which can be seen if we write this 
equation as 


AY, = Bi + Bot + tu (21.5.5 a) 
which means that Y, is nonstationary. 
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Deterministic trend with stationary AR(1) component: If in Eq. (21.5.1) B, # 0, B» # 0, By < 1, then 
we get 


Y; = Bi + Bot + Bs¥i-1 +m (21.5.6) 


which is stationary around the deterministic trend. 

To see the difference between stochastic and deterministic trends, consider Figure 21.5.14 The series 
named stochastic in this figure is generated by an RWM with drift: Y,=0.5 + Y, + u, where 500 values of 
u, were generated from a standard normal distribution and where the initial value of Y was set at 1. The series 
named deterministic is generated as follows: Y, = 0.5t + u,, where u, were generated as above and where t is 
time measured chronologically. 

As you can see from Figure 21.5, in the case of the deterministic trend, the deviations from the trend 
line (which represents the nonstationary mean) are purely random and they die out quickly; they do not 
contribute to the long-run development of the time series. which is determined by the trend component 0.5r. 
In the case of the stochastic trend, on the other hand, the random component u, affects the long-run course 
of the series Y, 
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Figure 21.5 Deterministic versus stochastic trend. 


Source: Charemza et al., op. cit., p. 91. 


21.6 Integrated Stochastic Processes 


The random walk model is but a specific case of a more general class of stochastic processes known as 
integrated processes. Recall that the RWM without drift is nonstationary, but its first difference, as shown in 
Eq. (21.3.8), is stationary. Therefore, we call the RWM without drift integrated of order 1, denoted as /(1). 
Similarly, if a time series has to be differenced twice (i.e., take the first difference of the first differences) to 
make it stationary, we call such a time series integrated of order 2.'> In general, if a (nonstationary) time 
series has to be differenced d times to make it stationary, that time series is said to be integrated of order d. 


“The following discussion is based on Wojciech W. Charemza et al., op. cit., pp. 89-91. 
15For example if Y, is (2), then AAY, = A(Y, - Y;1) = AY, — AY, = Ye 2%-1 + Yoz Will become stationary. But note that 
ACY. & Y.— Yeo. 
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A time series Y, integrated of order d is denoted as Y,~ I(d). If a time series Y, is stationary to begin with 
(i.e., it does not require any differencing), it is said to be integrated of order zero, denoted by Y,~ 1(O). Thus, 
. we will use the terms “stationary time series” and “time series integrated of order zero” to mean the same 
thing. 

Most economic time series are generally 1(1); that is, they generally become stationary only after taking 
their first differences. Are the time series shown in Figures 21.1 and 21.2 /(1) or of higher order? We will 
examine them in Sections 21.8 and 21.9. 


Properties of Integrated Series 


The following properties of integrated time series may be noted: Let X,, Y,, and Z, be three time series. 


1. If X,~ 1(O) and X, ~ 1(1), then Z, = (X, + Y,) = (1); that is, a linear combination or sum of stationary 
and nonstationary time series is nonstationary. 

2. If X, ~ (d), then Z, = (a + bX,) = Kd), where a and b are constants. That is, a linear combination of 
an I(d) series is also J(d). Thus, if X, ~ I(0), then Z, = (a + bX,) ~ (0). 

3. If X, ~ I(d,) and Y, ~ I(d), then Z, = (aX,+ bY,) ~ I(d,), where d, < d}. 

4. If X, ~ I(d) and Y, ~ I(d), then Z, = (aX,+ bY, ~ I(d*);, d“ is generally equal to d, but in some 
cases d” < d (see the topic of cointegration in Section 21.11). 


As you can see from the preceding statements, one has to pay careful attention in combining two or more time 
series that are integrated of different order. 

To see why this is important, consider the two-variable regression model discussed in Chapter 3, 
namely, Y, = B, + 62X, + u;. Under the classical OLS assumptions, we know that 


es Se (21.6.1) 
t ae 


where the small letters, as usual, indicate deviation from mean values. Suppose Y, is /(0), but X, is /(1); that is, 
the former is stationary and the latter is not. Since X, is nonstationary, its variance will increase indefinitely, 
thus dominating the numerator term in Eq. (21.6.1) with the result that ĝ, will converge to zero asymptoti- 
cally (i.e., in large samples) and it will not even have an asymptotic distribution. !© 


21.7 The Phenomenon of Spurious Regression 

To see why stationary time series are so important, consider the following two random walk models: 
Y, = Yi +u; (21.7.1) 
Xi = X1 +v: (21.7.2) 


where we generated 500 observations of u, from u, ~ N(O, 1) and 500 observations of v from v,~ N(0, 1) and 
assumed that the initial values of both Y and X were zero. We also assumed that u, and v, are serially uncor- 
related as well as mutually uncorrelated. As you know by now, both these time series are nonstationary; that 
is, they are Z(1) or exhibit stochastic trends. 


'6This point is due to Maddala et al., op. cit., p. 26. 
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Suppose we regress Y, on X, Since Y, and X, are uncorrelated /(1) processes, the R? from the regression 
of Y on X should tend to zero: that is, there should not be any relationship between the two variables. But 
wait till you see the regression results: 

Variable Coefficient 


> - Starnes t Stariswic 
(G -13.2556 0.6203 -21.36856 
X K 0.3376 0.0443 7.61223 


R? = 0.1044 a - o 0190: 


As you can see, the coefficient of X is highly statistically significant, and, although the R? value is low, 
it is statistically significantly different from zero. From these results, you may be tempted to conclude that 
there is a significant statistical relationship between Y and X, whereas a priori there should be none. This is 
in a nutshell the phenomenon of spurious or nonsense regression, first discovered by Yule. !” Yule showed 
that (spurious) correlation could persist in nonstationary time series even if the sample is very large. That 
there is something wrong in the preceding regression is suggested by the extremely low Durbin—Watson 
d value, which suggests very strong first-order autocorrelation. According to Granger and Newbold, an R? 
> dis a good rule of thumb to suspect that the estimated regression is spurious, as in the example above. It 
may be added that the R? and the z statistic from such a spurious regression are misleading, and the t statistics 
are not distributed as (Student’s) ¢ distribution and, therefore, cannot be used for testing hypotheses about the 
parameters. 

That the regression results presented above are meaningless can be easily seen from regressing the first 
differences of Y, (= AY,) on the first differences of X, (= AX,); remember that although Y, and X, are nonsta- 
tionary, their first differences are stationary. In such a regression you will find that R? is practically zero, as 
it should be, and the Durbin—Watson d is about 2. In Exercise 21.24 you are asked to run this regression 
and verify the statement just made. 

Although dramatic, this example is a strong reminder that one should be extremely wary of conducting 
regression analyses based on time series that exhibit stochastic trends. And one should therefore be extremely 
cautious in reading too much into the regression results based on /(1) variables. For an example, see Exercise 
21.26. To some extent, this is true of time series subject to deterministic trends, an example of which is given 
in Exercise 21.25. 


21.8 Tests of Stationarity 


By now the reader probably has a good idea about the nature of stationary stochastic processes and their impor- 
tance. In practice we face two important questions: (1) How do we find out if a given time series is stationary? 
(2) If we find that a given time series is not stationary, is there a way that it can be made stationary? We take up the 
first question in this section and discuss the second question in Section 21.10. - 

Before we proceed, keep in mind that we are primarily concerned with weak, or covariance, stationarity. 

Although there are several tests of stationarity, we discuss only those that are prominently discussed in the 
literature. In this section we discuss two tests: (1) graphical analysis and (2) the correlogram test. Because 
of the importance attached to it in the recent past, we discuss the unit root test in the next section. We ilus- 
trate these tests with appropriate examples. 


17G. U. Yule, “Why Do We Sometimes Get Nonsense Correlations Between Time Series? A Study in Sampling and the 
Nature of Time Series,” Journal of the Royal Statistical Society, vol. 89, 1926, pp. 1-64. For extensive Monte Carlo simula- 
tions on spurious regression see C. W. J. Granger and P. Newbold, “Spurious Regressions in Econometrics,” Journal of 


Econometrics, vol. 2, 1974, pp. 111-120. 
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l. Graphical Analysis 


As noted earlier, before one pursues formal tests, it is always advisable to plot the time series under study, as 
we have done in Figures 21.1 and 21.2 for the U.S. economic time series data posted on the book’s website. Such 
plots give an initial clue about the likely nature of the time series. Take, for instance, the GDP time series shown 
in Figure 21.1. You will see that over the period of study the log of GDP has been increasing, that is, showing an 
upward trend, suggesting perhaps that the mean of the log of GDP has been changing. This perhaps suggests that 
the log of the GDP series is not stationary. This is also more or less true of the other U.S. economic time series 
shown in Figure 21.2. Such an intuitive feel is the starting point of more formal tests of stationarity. 


2. Autocorrelation Function (ACF) and Correlogram 


One simple test of stationarity is based on the so-called autocorrelation function (ACF). The ACF at lag k, 
denoted by p} is defined as 

Yr 

Yo 

_ covariance at lag k 


Pk = 
(21.8.1) 


variance 


where covariance at lag k and variance are as defined before. Note that if k = 0, pọ = 1 (why?) 

Since both covariance and variance are measured in the same units of measurement, p, is a unitless, or 
pure, number. It lies between —1 and +1, as any correlation coefficient does. If we plot p, against k, the graph 
we obtain is known as the population correlogram. 

Since in practice we only have a realization (i.e., sample) of a stochastic process, we can only compute the 
sample autocorrelation function (SAFC), ôx. To compute this, we must first compute the sample covariance 
at lag k, 7, and the sample variance, yo, which are defined as:1® 


a Cr Sn) 


y= a (21.8.2) 
Patino 
Yo = au ar (21.8.3) 
where n is the sample size and F is the sample mean. md 
Therefore, the sample autocorrelation function at lag k is: 
ee: 
[gree _ (21.8.4) 
Yo 


which is simply the ratio of sample covariance (at lag k) to sample variance. A plot of 6; against k is known 
as the sample correlogram. 

How does a sample correlogram enable us to find out if a particular time series is stationary? For this 
purpose, let us first present the sample correlograms of a purely white noise random process and of a random 
walk process. Return to the driftless RWM (21.3.13). There we generated a sample of 500 error terms, the u’s, 
from the standard normal distribution. The correlogram of these 500 purely random error terms is as shown in 


Figure 21.6; we have shown this correlogram up to 30 lags. We will comment shortly on how one chooses the 
lag length. 


'8strictly speaking, we should divide the sample covariance at lag k by (n ~ k) and the sample variance by (n — 1) rather 
than by n (why?), where n is the sample size. 
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Autocorrelation Partial Correlation AC PAC Q-Stat Prob 
1 -0.022 -0.022 0.2335 0.629 
2 -0.019 -0.020 0.4247 0.809 
3 -0.009 -0.010 0.4640 0.927 
4 -0.031 -0.031 0.9372 0.919 
5 -0.070 -0.072 3.4186 0.636 
6 -0.008 -0.013 3.4493 0.751 
7 0.048 0.045 4.6411 0.704 
8 -0.069 -0.070 7.0385 0.532 
9 0.022 0.017 7.2956 0.606 
10 -0.004 —-0.011 7.3059 0.696 


—_ 
dl 


0.024 0.025 7.6102 0.748 
12 0.024 0.027 7.8993 0.793 
13 0.026 0.021 8.2502 0.827 
14 -0.047 -0.046 9.3726 0.806 
15 -0.037 -0.030 10.074 0.815 
16 -0.026 -0.031 10.429 0.843 
17 -0.029 -0.024 10.865 0.863 
18 -0.043 -0.050 11.807 0.857 
19 0.038 0.028 12575 0.860 
20 0.099 0.093 17.739 0.605 
21 0.001 0.007 17.739 0.665 
22 0.065 0.060 19.923 0.588 
23 0.053 0.055 21.404 0.556 
24 -0.017 —0.004 21.553 0.606 
25 -0.024 -0.005 21.850 0.644 
26 -0.008 -0.008 21.885 0.695 
27 -0.036 -0.027 22.587 0.707 
28 0.053 0.072 24.068 0.678 
29 -0.004 —0.011 24.077 0.725 
30 -0.026 -0.025 24.445 0.752 


Figure 21.6 Correlogram of white noise error term #. AC = autocorrelation, PAC = partial autocorrelation (see 
Chapter 22), Q-Stat = O statistic, Prob = probability. 


For the time being, just look at the column labeled AC, which is the sample autocorrelation function, and 
the first diagram on the left, labeled Autocorrelation. The solid vertical line in this diagram represents the 
zero axis; observations to the right of the line are positive values and those to the left of the line are negative 
values. As is very clear from this diagram, for a purely white noise process the autocorrelations at various lags 
hover around zero. This is the picture of a correlogram of a stationary time series. Thus, if the corre- 
logram of an actual (economic) time series resembles the correlogram of a white noise time series, we can say 
that time series is probably stationary. 

Now look at the correlogram of a random walk series, as generated, say, by Eq. (21.3.13). The picture is as 
shown in Figure 21.7. The most striking feature of this correlogram is that the autocorrelation coefficients at 
various lags are very high even up to a lag of 33 quarters. As a matter of fact, if we consider lags of up to 60 
quarters, the autocorrelation coefficients are quite high; the coefficient is about 0.7 at lag 60. Figure 2 1.7 is the 
typical correlogram of a nonstationary time series: The autocorrelation coefficient starts at a very high value 
and declines very slowly toward zero as the lag lengthens. 
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ee ESS SSS eee 
Autocorrelation Partial Correlation AC PAC Q-Stat Prob 
Oe E O E 


0.992 0.992 493.86 0.000 
0.984 0.000 980.68 0.000 
0.976 0.030 1461.1 0.000 
0.969 0.005 1935.1 0.000 
2402.0 0.000 
0.953 0.050 2862.7 0.000 
0.946 0.004 3317.3 0.000 
0.939 0.040 3766.4 0.000 
0.932 -0.009 4210.1 0.000 
4649.1 0.000 
5083.9 0.000 
12 0.916 0.039 55149 0.000 
13 0912 0.002 5942.4 0.000 
14 0.908 0.056 6367.0 0.000 
115 0.905 0.061 6789.8 0.000 
16 0.902 0.000 7210.6 0.000 
17 0.899 0.006 7629.4 0.000 
18 0.896 0.030 8046.7 0.000 
19 0.894 0.053 8463.1 0.000 
20 0.892 0.013 8878.7 0.000 
21 0.890 -0.041 92926 0.000 
22 0.886 -0.040 9704.1 0.000 
23 0.882 -0.044 10113. 0.000 
24 0.878 -0.012 10518. 0.000 
25 0.873 -0.023 10920. 0.000 
26 0.867 -0.041 11317. 0.000 
27 0.860 -0.055 11709. 0.000 
28 0.853 -0.045 12095. 0.000 
29 0.846 -0.010 12476. 0.000 
30 0.839 0.008 12851. 0.000 
31 0.832 -0.006 13221. 0.000 
32 0.825 0.003 13586. 0.000 
33 0.819 -0.006 13946. 0.000 
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Figure 21.7 Correlogram of a random walk time series. See Figure 21.6 for definitions. 


Now let us take a concrete economic example. Let us examine the correlogram of the LGDP time 
series plotted using the U.S. economic times series data posted on the book’s website (see Section 21.1). 
The correlogramup to 36 lags is shown in Figure 21.8. The LGDP correlogram up to 36 lags also shows 
a pattern similar to the correlogram of the random walk model in Figure 21.7. The autocorrelation 
coefficient starts at a very high value at lag 1 (0.977) and declines very slowly. Thus it seems that the 
LGDP time series is nonstationary. If you plot the correlograms of the other U.S. economic time series 
shown in Figures 21.1 and 21.2, you will also see a similar pattern, leading to the conclusion that all 
these time series are nonstationary; they may be nonstationary in mean or variance or both. 
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Autocorrelation Partial Correlation AC PAC Q-Stat Prob 


0.977 0.977 235.73 0.000 
0.954 -0.009 461.43 0.000 
0.931 -0.010 677.31 0.000 
0.908 —0.006 883.67 0.000 
0.886 -0.003 1080.9 0.000 
0.864 -0.001 1269.3 0.000 
0.843 -0.006 1449.3 0.000 
0.822 -0.006 1621.0 0.000 
0.801 —0.010 17846 0.000 
10 0.780 -0.004 19406 0.000 
11 0.759 -0.007 2089.0 0.000 
12 0.738 -0.013 2230.0 0.000 
13 0.718 0.003 2364.1 0.000 
14 0.699 -0.005 2491.5 0.000 
15 0.679 -0.001 2612.4 0.000 
16 0.660 -0.004 2727.2 0.000 
17 0.642 -0.002 2836.2 0.000 
18 0.624 0.002 2939.6 0.000 
19 0.607 0.003 3037.8 0.000 
20 0.590 -0.003 3130.9 0.000 
21 0.573 -0.003 3219.3 0.000 
22 0.557 -0.003 3303.1 0.000 
23 0.541 -0.001 33825 0.000 
24 0.526 0.007 3457.9 0.000 
25 0.511 0.002 3529.4 0.000 
26 0.496 -0.005 3597.2 0.000 
27 0.482 -0.011 3661.4 0.000 
28 0.467 -0.009 3722.0 0.000 
29 0.453 -0.005 3779.2 0.000 
30 0.438 —0.006 3833.1 0.000 
31 0.424 -0.005 3883.9 0.000 
32 0.411 0.004 3931.6 0.000 
33 0.398 0.004 3976.7 0.000 
34 0.385 -0.001 4019.1 0.000 
35 0.373 -0.009 4058.9 0.000 
36 0.360 -0.010 4096.3 0.000 
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Figure 21.8 Correlogram of U.S, LGDP, 1947-I to 2007—IV. See Figure 21.6 for definitions. 


Two practical questions may be posed here. First, how do we choose the lag length to compute 
the ACF? Second, how do you decide whether a correlation coefficient at a certain lag is statistically 
significant? The answer follows. 
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The Choice of Lag Length 


This is basically an empirical question. A rule of thumb is to compute ACF up to one-third to one-quarter the 
length of the time series. Since for our economic data we have 244 quarterly observations, by this rule lags 
of 61 to 81 quarters will do. To save space, we have only shown 36 lags in the ACF graph in Figure 21.8. The 
best practical advice is to start with sufficiently large lags and then reduce them by some statistical criterion, 
such as the Akaike or Schwarz information criterion that we discussed in Chapter 13. Alternatively, one can 
use the following statistical tests. 


Statistical Significance of Autocorrelation Coefficients 


Consider, for instance, the correlogram of the LGDP time series given in Figure 21.8. How do we 
decide whether the correlation coefficient of 0.780 at lag 10 (quarters) is statistically significant? 
The statistical significance of any ô, can be judged by its standard error. Bartlett has shown that if 
a time series is purely random, that is, it exhibits white noise (see Figure 21.6), the sample autocor- 
relation coefficients 6; are approximately!” 


ôk ~ N(0, 1/n) (21.8.5) 
that is, in large samples the sample autocorrelation coefficients are normally distributed with zero 
mean and variance equal to one over the sample size. Since we have 244 observations, the variance 


is 1/244 ~ 0.0041 and the standard error is v 0.0041 ~ 0.0640. Then following the properties of the 
standard normal distribution, the 95 percent confidence interval for any (population) p; is: 


px + 1.96(0.0640) = 6, + 0.1254 (21.8.6) 
In other words, 
Prob (6, — 0.1254 < px < py + 0.1254) = 0.95 (21.8.7) 


If the preceding interval includes the value of zero, we do not reject the hypothesis that the true 
px 18 zero, but if this interval does not include 0, we reject the hypothesis that the true p, is zero. 
Applying this to the estimated value of iọ = 0.780 the reader can verify that the 95 percent 
confidence interval for true p;o is (0.780 + 0.1254) or (0.6546, 0.9054). Obviously, this interval 
does not include the value of zero, suggesting that we are 95 percent confident that the true Pio 1S 
significantly different from zero.”! As you can check, even at lag 20 the estimated py, is statistically 
significant at the 5 percent level. 

Instead of testing the statistical significance of any individual autocorrelation ceefficient, we can 
test the joint hypothesis that all the p, up to certain lags are simultaneously equal to zero. This can 
be done by using the Q statistic developed by Box and Pierce, which is defined as”? 


Q=n) ô (21.8.8) 
(= 


19M. S. Bartlett, “On the Theoretical Specification of Sampling Properties of Autocorrelated Time Series,” Journal of the 
Royal Statistical Society, Series B, vol. 27, 1946, pp. 27-41. 

?°Our sample size of 244 observations is reasonably large to use the normal approximation. 

21 Alternatively, if you divide the estimated value of any p, by the standard error of (,/1/n), for sufficiently large n, you 
will obtain the standard Z value, whose probability can be easily obtained from the standard normal table. Thus for the 
estimated p;9 = 0.780, the Z value is 0.780/0.1066 = 7.32 (approx.). If the true Pio Were in fact zero, the probability of 
obtaining a Z value of as much as 7.32 or greater is very small, thus rejecting the hypothesis that the true p4ọ is zero. 
?2C_ E. P. Box and D. A. Pierce, “Distribution of Residual Autocorrelations in Autoregressive Integrated Moving Average 
Time Series Models,” Journal of the American Statistical Association, vol. 65, 1970, pp.1509-1526. 
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where n= sample size and m= lag length. The Q statistic is often used as a test of whether a time series is 
white noise. Inlarge samples, itis approximately distributed as the chi-square distribution with mdf. Inan 
application, if the computed Q exceeds the critical Q value from the chi-square distribution at the 
chosen level of significance, one can reject the null hypothesis that all the (true) p, are zero; at least 
some of them must be nonzero. 


A variant of the Box—Pierce Q statistic is the Ljung—Box (LB) statistic, which is defined as” 


m aD 
LB = n(n +2) Y ( z) ~ x?m (21.8.9) 
k=1 z 


Although in large samples both Q and LB statistics follow the chi-square distribution with m df, the LB 
statistic has been found to have better (more powerful, in the statistical sense) small-sample properties than 
the Q statistic. 

Returning to the LGDP example given in Figure 21.8, the value of the Q statistic up to lag 36 is about 
4096. The probability of obtaining such a Q value under the null hypothesis that the sum of 36 squared 
estimated autocorrelation coefficients is zero is practically zero, as the last column of that figures shows. 
Therefore, the conclusion is that the LGDP time series is probably nonstationary, therefore reinforcing our 
hunch from Figure 21.1 that the LGDP series may be nonstationary. In Exercise 21.16 you are asked to 
confirm that the other four U.S. economic time series are also nonstationary. 


21.9 The Unit Root Test 


A test of stationarity (or nonstationarity) that has become widely popular over the past several years is the 
unit root test. We will first explain it, then illustrate it, and then consider some of its Limitations. 

The starting point is the unit root (stochastic) process that we discussed in Section 21.4. We start with 

ke Fu, -1< pst (21.4.1) 
where u, is a white noise error term. 

We know that if p = 1, that is, in the case of the unit root, Eq. (21.4.1) becomes a random walk model 
without drift, which we know is a nonstationary stochastic process. Therefore, why not simply regress Y, on 
its (one-period) lagged value Y,_, and find out if the estimated p is statistically equal to 1? If it is, then Y, is 
nonstationary. This is the general idea behind the unit root test of stationarity. 

However, we cannot estimate Eq. (21.4.1) by OLS and test the hypothesis that p = 1 by the usual f test 
because that test is severely biased in the case of a unit root. Therefore, we manipulate Eq. (21.4.1) as follows: 
Subtract Y, , from both sides of Eq. (21.4.1) to obtain: 

Y Vey = ele — ear er (219.1) 
=(p-1)¥-1 +u: a 
which can be alternatively written as: 


AY, = 6Y;-1 + ur (21.9.2) 


where 5 = (p — 1) and A, as usual, is the first difference operator. 

In practice, therefore, instead of estimating Eq. (21.4.1), we estimate Eq. (21.9.2) and test the (null) 
hypothesis that 5 = 0, the alternative hypothesis being that ô < 0 (see footnote 25). If 6 = 0, then p = 1, that is 
we have a unit root, meaning the time series under consideration is nonstationary. 


23C, M. Ljung and C. E. P. Box, “On a Measure of Lack of Fit in Time Series Models,” Biometrika, vol. 66, 1978, 
pp. 66-72. 
24The Q and LB statistics may not be appropriate in every case. For a critique, see Maddala et al., op. cit., p. 19. 
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Before we proceed to estimate Eq. (21.9.2), it may be noted that if 6 = 0, Eq. (21.9.2) will become 
AY, = (Y, — Y-1) = ur (21.9.3) 


Since u, is a white noise error term, it is stationary, which means that the first differences of a 
random walk time series are stationary, a point we have already made before. 

Now let us turn to the estimation of Eq. (21.9.2). This is simple enough; all we have to do is to 
take the first differences of Y, and regress them on Y, , and see if the estimated slope coefficient 
in this regression ( = 5) is zero or not. If it is zero, we conclude that Y, is nonstationary. But if it is 
negative, we conclude that Y, is stationary.” The only question is which test we use to find out if 
the estimated coefficient of Y, in Eq. (21.9.2) is zero or not. You might be tempted to say, why not 
use the usual ¢ test? Unfortunately, under the null hypothesis that 6 = 0 (i.e., p = 1), the f value of 
the estimated coefficient of Y, , does not follow the ż distribution even in large samples; that is, it 
does not have an asymptotic normal distribution. 

What is the alternative? Dickey and Fuller have shown that under the null hypothesis that 6 = 0, 
the estimated ż value of the coefficient of Y,_, in Eq. (21.9.2) follows the 7 (tau) statistic.” These 
authors have computed the critical values of the tau statistic on the basis of Monte Carlo simula- 
tions. A sample of these critical values is given in Appendix D, Table D.7. The table is limited, 
but MacKinnon has prepared more extensive tables, which are now incorporated in several econo- 
metric packages.” In the literature the tau statistic or test is known as the Dickey—Fuller (DF) test, 
in honor of its discoverers. Interestingly, if the hypothesis that 6 = 0 is rejected (i-e., the time series 
is stationary), we can use the usual (Student’s) ż test. Keep in mind that the Dickey—Fuller test is 
one-sided because the alternative hypothesis is that 6 < 0 (or p < 1). 

The actual procedure of implementing the DF test involves several decisions. In discussing the 
nature of the unit root process in Sections 21.4 and 21.5, we noted that a random walk process may 
have no drift, or it may have drift, or it may have both deterministic and stochastic trends. To allow 
for the various possibilities, the DF test is estimated in three different forms, that is, under three 
different null hypotheses. 


Y, is a random walk: AY, = Y, + ú; (21.9.2) 
Y, is a random walk with drift: AY, = bi +8Y, i1 +u, i (21.9.4) 
Y, is a random walk with drift X 

around a deterministic trend: AY, = bi + Bot + d5Y,_; +u, (21.9.5) 


where t is the time or trend variable. In each case the hypotheses are: 


Null hypothesis: Hy : 6 = 0 (.e., there is a unit root or the time series is nonstationary, or it has a 
stochastic trend). 


Alternative hypothesis: H, : 6 < 0 (i.e., the time series is stationary, possibly around a determin- 
istic trend).7® 


?5since ô = (p ~ 1), for stationarity p must be less than one. For this to happen ô must be negative. 

26D. A. Dickey and W. A. Fuller, “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of 
the American Statistical Association, vol. 74, 1979, pp. 427—431. See also W. A. Fuller, Introduction to Statistical Time Series, 
John Wiley & Sons, New York, 1976. 

27). G. MacKinnon, “Critical Values of Cointegration Tests,” in R. E. Engle and C. W. J. Granger, eds., Long-Run Economic 
Relationships: Readings in Cointegration, Chapter 13, Oxford University Press, New York, 1991. 


?8We rule out the possibility that 5 > 0, because in that case p > 1, in which case the underlying time series will be 
explosive. 
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If the null hypothesis is rejected, it means either (1) Y, is stationary with zero mean, in the case of 
Eq. (21.9.2), or (2) Y, is stationary with nonzero mean, in the case of Eq. (21.9.4). In the case of Eq. 
(21.9.5), we can test for ô < 0 (i.e., no stochastic trend) and a # 0 (i.e., the existence of a determin- 
istic trend) simultaneously, using the F test, but using the critical values tabulated by Dickey and 
Fuller. It may be noted that a time series may contain both a stochastic and a deterministic trend. 

It is extremely important to note that the critical values of the tau test to test the hypothesis that 
ô = 0 are different for each of the preceding three specifications of the DF test, which can be seen 
clearly from Appendix D, Table D.7. Moreover, if, say, specification (21.9.4) is correct, but we 
estimate Eq. (21.9.2), we will be committing a specification error, whose consequences we already 
know from Chapter 13. The same is true if we estimate Eq. (21.9.4) rather than the true Eq. (21.9.5). 
Of course, there is no way of knowing which specification is correct to begin with. Some trial and 
error is inevitable, data mining notwithstanding. 

The actual estimation procedure is as follows: Estimate Eq. (21.9.2), or Eq. (21.9.3), or Eq. 
(21.9.4) by OLS: divide the estimated coefficient of Y,_, in each case by its standard error to 
compute the (7) tau statistic; and refer to the DF tables (or any statistical package). If the computed 
absolute value of the tau statistic (I7l) exceeds the absolute DF or MacKinnon critical tau values, we 
reject the hypothesis that 6 = 0, in which case the time series is stationary. On the other hand, if the 
computed Irl does not exceed the absolute critical tau value, we do not reject the null hypothesis, in 
which case the time series is nonstationary. Make sure that you use the appropriate critical 7 values. 
In most applications the tau value will be negative. Therefore, alternatively we can say that if the 
computed (negative) tau value is smaller than (i.e., more negative than) the critical tau value, we 
reject the null hypothesis (i.e., the time series is stationary) otherwise, we do not reject it (1.e., the 
time series is nonstationary). 

Let us return to the U.S. GDP time series. For this series, the results of the three regressions 
(21.9.2), (21.9.4), and (21.9.5) are as follows: The dependent variable in each case is AY, = ALGDP,, 
where LGDP is the logarithm of real GDP. 


ALGDP, = 0.000968LGDP,_; 


(21.9.6) 
t = (12.9271) R? = 0.0147 d= 1.3195 
ALGDP, = 0.0221 — 0.00165LGDP,_, 
(21.9.7) 
t= (2.4342) (—1.5294) R?=0.0096 d= 1.3484 
ALGDP, = 0.2094 + 0.0002¢— —0.0269LGDP,_ 
= 7040 —1.8102 
t= (1.8988) (1.7040) ( ) olen 


R? = 0.0215 d = 1.3308 


Our primary interest in all these regressions is in the (= 7) value of the LGDP,_, coefficient. If 
you look at Table D.7 in Appendix D, you will see that the 5 percent critical tau values for sample 
size 250 (the closest number to our sample of 244 observations) are —1.95 (no intercept, no trend), 
—2.88 (intercept but no trend), and —3.43 (intercept as well as trend). EViews and other statistical 
packages provide critical values for the sample size used in the analysis. 

Before we examine the results, we have to decide which of the three models may be appro- 
priate. We should rule out model (21.9.6) because the coefficient of LGDP,_, which is equal to 6 
is positive. But since ô= (p — 1), a positive 6 would imply that p > 1. Although a ec aan possi- 
bility, we rule this out because in this case the LGDP time series would be explosive.“ That leaves 


29More technically, since Eq. (21.9.2) is a first-order difference equation, the so-called stability condition requires that 
lp| <1. 
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us with models (21.9.7) and (21.9.8). In both cases the estimated 6 coefficient is negative, implying 
that the estimated p is less than 1. For these two models, the estimated p values are 0.9984 and 
0.9731, respectively. The only question now is if these values are statistically significantly below 1 
for us to declare that the GDP time series is stationary. 

For model (21.9.7) the estimated 7 value is —1.5294, whereas the 5 percent critical 7 value, as 
noted above, is —2.88. Since, in absolute terms, the former is smaller than the latter, our conclusion 
is that the LGDP time series is not stationary.” 

The story is the same for model (21.9.8). The computed 7 value of —1.8102, in absolute terms, is 
smaller than the 5 percent critical value of —3.43. 

Therefore, on the basis of graphical analysis, the correlogram, and the Dickey—Fuller test, the 
conclusion is that for the quarterly periods of 1947 to 2007, the U.S. LGDP time series was nonsta- 
tionary; i.e., it contained a unit root, or it had a stochastic trend. 


The Augmented Dickey—Fuller (ADF) Test 


In conducting the DF test as in Eqs. (21.9.2), (21.9.4), and (21.9.5), it was assumed that the error 
term u, was uncorrelated. But in case the u, are correlated, Dickey and Fuller have developed another 
test, known as the augmented Dickey—Fuller (ADF) test. This test is conducted by “augmenting” the 
preceding three equations by adding the lagged values of the dependent variable AY, To be specific, 
suppose we use Eq. (21.9.5). The ADF test here consists of estimating the following regression: 


AY, = Bi + Bot +5%-1+ PaA +: (21.9.9) 
i=l 

where g, is a pure white noise error term and where AY,_, = (Y,_; — Y,_2), AY, = (Y, — Y,-3), etc. 
The number of lagged difference terms to include is often determined empirically, the idea being 
to include enough terms so that the error term in Eq. (21.9.9) is serially uncorrelated, so that we 
can obtain an unbiased estimate of 6, the coefficient of lagged Y,_,. EViews 6 has an option that 
automatically selects the lag length based onAkaike, Schwarz, and other information criteria. In 
ADF we still test whether 6 = 0 and the ADF test follows the same asymptotic distribution as the 
DF statistic, so the same critical values can be used. 

To give a glimpse of this procedure, we estimated Eq. (21.9.9) for the LGDP series. Since we 
have quarterly data, we decided to use four lags. The results of the ADF regression are as follows:*! 


ALGDP, = 0.2677 + 0.0003¢ — 0.0352LGDP,-ı + 0.2990ALGDP,_; + 0.1451 ALGDP,_> — 0.0621 ALGDP,_3 — 0.0876ALGDP, 


II 


t = (2.4130) (2.2561) (—2.3443) (4.6255) (2.1575) (—0.9205) (—1.3438) 


R=0.1617 d=2.0075 
(21.9.10) 


The t (= 7) value of the lagged LGDP,__, coefficient ( = 5) is —2.3443, which in absolute terms 
is much less than even the 10 percent critical 7 value of —3.1378, again suggesting that even after 
taking care of possible autocorrelation in the error term, the LGDP series is non- stationary. (Note: 
The @trend command in EViews automatically generates the time or trend variable.) 


Another way of stating this is that the computed 7 value should be more negative than the critical 7 value, which is 
not the case here. Hence the conclusion stays. Since in general 5 is expected to be negative, the estimated 7 statistic 
will have a negative sign. Therefore, a large negative 7 value is generally an indication of stationarity. 


3'Higher-order lagged differences were considered but they were insignificant. 
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Could this be the result of our choosing only four lagged values of ALGDP? We used the Schwarz 
criterion using 14 lagged values of ALGDP, which gave the tau value 5 of —1.8102. Even then, this 
tau value was not significant at the 10 percent level (the critical tau value at this level was —3.1376). 
It seems logged GDP is nonstationary. 


Testing the Significance of More than One Coefficient: The F Test 


Suppose we estimate model (21.9.5) and test the hypothesis that Bı = B = O, that is, the model is 
RWM without drift and trend. To test this joint hypothesis, we can use the restricted F test discussed 
in Chapter 8. That is, we estimate Eq. (21.9.5) (the unrestricted regression) and then estimate Eq. 
(21.9.5) again, dropping the intercept and trend. Then we use the restricted F test as shown in Eq. 
(8.6.9), except that we cannot use the conventional F table to get the critical F values. As they did 
with the 7 statistic, Dickey and Fuller have developed critical F values for this situation, a sample 
of which is given in Appendix D, Table D.7. An example is presented in Exercise 21.27. 


The Phillips—Perron (PP) Unit Root Tests?? 


An important assumption of the DF test is that the error terms u, are independently and identically 
distributed. The ADF test adjusts the DF test to take care of possible serial correlation in the error 
terms by adding the lagged difference terms of the regressand. Phillips and Perron use nonpara- 
metric statistical methods to take care of the serial correlation in the error terms without adding 
lagged difference terms. Since the asymptotic distribution of the PP test is the same as the ADF test 
statistic, we will not pursue this topic here. 


Testing for Structural Changes 


The macroeconomic data introduced in Section 21.1 (see the book’s website for the actual data) 
are for the period 1947—2007, a period of 61 years. In this period the U.S. economy experienced 
several business cycles of varying durations. Business cycles are marked by periods of recessions 
and periods of expansions. It is quite likely that one business cycle is different from another, which 
may reflect structural breaks or structural changes in the economy. 

For instance, take the first oil embargo in 1973. It quadrupled oil prices. Prices again increased substan- 
tially after the second oil embargo in 1979. Naturally, these shocks will affect economic behavior. Therefore, 
if we were to regress personal consumption expenditure (PCE) on disposable personal income (DPI), the 
intercept, the slope, or both are likely to change from one business cycle to another (recall the Chow test of 
structural breaks). This is what is meant by structural changes. 

Perron, for instance, has argued that the standard tests of the unit root hypothesis may not be reliable in 
the presence of structural changes.” There are ways to test for structural changes and to account for them, the 
simplest involving the use of dummy variables. But a discussion of the various tests of structural breaks will 
take us far a field and is best left for the references.” However, see Exercise 21.28. 


32P._ C. B. Phillips and P. Perron, “Testing for a Unit Root in Time Series Regression,” Biometrika, vol. 75, 1988, pp. 335- 
346. The PP test is now included in several software packages. 

33p. Perron, “The Great Crash, the Oil Price Shock and the Unit Root Hypothesis,” Econometrica, vol. 57, 1989, 
pp. 1361-1401. 

34For an accessible discussion, see James H. Stock and Mark W. Watson, Introduction to Econometrics, 2d ed., Pearson/ 
Addison-Wesley, Boston, 2007, pp. 565-571. For a more thorough discussion, see G. S. Maddala and In-Moo Kim, 
Unit Roots, Cointegration, and Structural Change, Cambridge University Press, New York, 1998. 
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A Critique of the Unit Root Tests?’ 


We have discussed several unit root tests and there are several more. The question is: Why are there 
so many unit root tests? The answer lies in the size and power of these tests. By size of a test we 
mean the level of significance (i.e., the probability of committing a Type I error) and by power of a 
test we mean the probability of rejecting the null hypothesis when it is false. The power of a test is 
calculated by subtracting the probability of a Type II error from 1; Type II error is the probability 
of accepting a false null hypothesis. The maximum power is 1. Most unit root tests are based on the 
null hypothesis that the time series under consideration has a unit root; that is, it is nonstationary. 
The alternative hypothesis is that the time series is stationary. 


Size of Test 


You will recall from Chapter 13 the distinction we made between the nominal and the true levels of signifi- 
cance. The DF test is sensitive to the way it is conducted. Remember that we discussed three varieties of the 
DF test: (1) a pure random walk, (2) a random walk with drift, and (3) a random walk with drift and trend. 
If, for example, the true model is (1) but we estimate (2), and conclude that, say, on the 5 percent level that 
the time series is stationary, this conclusion may be wrong because the true level of significance in this case 
is much larger than 5 percent.*© The size distortion could also result from excluding moving average (MA) 
components from the model (on moving average, see Chapter 22). 


Power of Test 


Most tests of the DF type have low power; that is, they tend to accept the null of unit root more 
frequently than is warranted. That is, these tests may find a unit root even when none exists. There 
are several reasons for this. First, the power depends on the (time) span of the data more than the 
mere size of the sample. For a given sample size n, the power is greater when the span is large. Thus, 
a unit root test(s) based on 30 observations over a span of 30 years may have more power than one 
based on, say, 100 observations over a span of 100 days. Second, if p = 1 but not exactly 1, the unit 
root test may declare such a time series nonstationary. Third, these types of tests assume a single 
unit root; that is, they assume that the given time series is /(1). But if a time series is integrated of 
order higher than 1, say, /(2), there will be more than one unit root. In the latter case one may use 
the Dickey—Pantula test.*’ Fourth, if there are structural breaks in a time series (see the chapter 
on dummy variables) due to, say, the OPEC oil embargoes, the unit root tests may not catch them. 

In applying the unit root tests one should therefore keep in mind the limitations of the tests. Of 
course, there have been modifications of these tests by Perron and Ng, Elliot, Rothenberg and Stock, 
Fuller, and Leybounre.*® Because of this, Maddala and Kim advocate that the traditional DF, ADF, and 
PP tests should be discarded. As econometric software packages incorporate the new tests, that may 
very well happen. But it should be added that as yet there is no uniformly powerful test of the unit root 
hypothesis. 


21.10 Transforming Nonstationary Time Series 


Now that we know the problems associated with nonstationary time series, the practical question is what 
to do. To avoid the spurious regression problem that may arise from regressing a nonstationary time series 


35For detailed discussion, see Terrence C. Mills, op. cit., pp. 87-88. 
**For a Monte Carlo experiment about this, see Charemza et al., op. cit., p. 114. 


37D. A. Dickey and S. Pantula, “Determining the Order of Differencing in Autoregressive Processes,” Journal of Business and 
Economic Statistics, vol. 5, 1987, pp. 455—461. 


38A discussion of these tests can be found in Maddala et al., op. cit., Chapter 4. 
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on one or more nonstationary time series, we have to transform nonstationary time series to make them 
stationary. The transformation method depends on whether the time series are difference stationary (DSP) or 
trend stationary (TSP). We consider each of these methods in turn. 


Difference-Stationary Processes 


If a time series has a unit root. the first differences of such time series are stationary.*? Therefore, the 
solution here is to take the first differences of the time series. 

Returning to our U.S. LGDP time series, we have already seen that it has a unit root. Let us now see what 
happens if we take the first differences of the LGDP series. 

Let ALGDP, = (LGDP, — LGDP,_,). For convenience, let D, = ALGDP,. Now consider the 
following regression: 


AD, = 0.00557—  0.6711D;ı 
t= (7.1407) (—11.0204) (21.10.1) 


R? = 0.3360 d = 2.0542 


The 1 percent critical DF 7 value is —3.4573. Since the computed 7 ( = r) of —11.0204 is more negative 
than the critical value, we conclude that the first-differenced LGDP is stationary; that is, it is Z(0). It is as 
shown in Figure 21.9. If you compare Figure 21.9 with Figure 21.1, you will see the obvious difference 
between the two. 
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Figure 21.9 First differences of logs of U.S. GDP, 1947-2007 (quarterly). 


Trend-Stationary Processes 


As we have seen in Figure 21.5, a TSP is stationary around the trend line. Hence, the simplest way to make such 
a time series stationary is to regress it on time and the residuals from this regression will then be stationary. In 
_ other words, run the following regression: 


3f a time series is (2), it will contain two unit roots, in which case we will have to difference it twice. If it is (d), it has 
_to be differenced d times, where d is any integer. 
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Y, = Bi + Pot + ú: (21.10.2) 


where Y, is the time series under study and where is the trend variable measured chronologically. 
Now 
û, = (Y, — Bi — Bat) (21.10.3) 


will be stationary. , is known as a (linearly) detrended time series. 
It is important to note that the trend may be nonlinear. For example, it could be 


Y, = Bi + Pot + pst? + ur y (21.10.4) 


which is a quadratic trend series. ‘If that is the case, the residuals from Eq. (21.10.4) will now be (quadrati- 
cally) detrended time series. 

It should be pointed out that if a time series is DSP but we treat it as TSP, this is called underdiffer- 
encing. On the other hand, if a time series is TSP but we treat it as DSP, this is called overdifferencing. 
The consequences of these types of specification errors can be serious, depending on how one handles 
the serial correlation properties of the resulting error terms.“ 

To see what happens if we confuse a TSP series with a DSP series or vice versa, Figure 21.10 shows the 
first-differenced LGDP and the residuals of LGDP estimated from the TSP regression (21.10.2): 
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Figure 21.10 First differences (delta LGDP) and deviations from trend (RESI1) for logged GDP, 1947-2007 (quarterly). 


A look at this figure tells us that the first differences of real logged DGP are stationary (as confirmed by 
regression (21.10.1) but the residuals from the trend line (RESI1) are not. 

In summary, “. . . it is very important to apply the right sort of stationarity transform to the data, 
if they are not already stationary. Most financial markets generate price, rate or yield data that are 
non-stationary because of stochastic rather than a deterministic trend. It is hardly ever appropriate 
to detrend the data by fitting a trend line and taking deviations. Instead the data should be detrended 
by taking first differences, usually of the log price or rates, because then the transformed stationary 
data will correspond to market returns.’”*! 


4°F or a detailed discussion of this, see Maddala et al., op. cit., Section 2.7. 
4'Carol Alexander, op. cit., p. 324. 
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21.1! Cointegration: Regression of a Unit Root Time Series on 
Another Unit Root Time Series 


We have warned that the regression of a nonstationary time series on another nonstationary time series may 
produce a spurious regression. Let us suppose that we consider the LPCE and LDPI time series data intro- 
duced in Section 21.1 (see the book’s website for the actual data). Subjecting these time series individually 
to unit root analysis, you will find that they both are /(1); that is, they contain a stochastic trend. It is quite 
possible that the two series share the same common trend so that the regression of one on the other will not 
be necessarily spurious. 

To be specific, we use the U.S. economic time series data (see Section 21.1 and the book’s website) and 
run the following regression of LPCE on LDPI: 


LPCE, = 6 + BoLDPI, + u, (21.11.1) 


where L denotes logarithm. B, is the elasticity of real personal consumption expenditure with 
respect to real disposable personal income. For illustrative purposes, we will call it consumption 
elasticity. Let us write this as: 


u, = LPCE, — B; — B2LDPI, (21.11.2) 


Suppose we now subject u, to unit root analysis and find that it is stationary; that is, it is 1(0). This 
is an interesting situation, for although LPCE, and LDPI, are individually /(1), that is, they have 
stochastic trends, their linear combination (21.11.2) is 1(0). So to speak, the linear combination 
cancels out the stochastic trends in the two series. If you take consumption and income as two 
I(l) variables, savings defined as (income — consumption) could be /(0). As a result, a regression 
of consumption on income as in Eq. (21.11.1) would be meaningful (.e., not spurious). In this 
case we Say that the two variables are cointegrated. Economically speaking, two variables will be 
cointegrated if they have a long-term, or equilibrium, relationship between them. Economic theory 
is often expressed in equilibrium terms, such as Fisher’s quantity theory of money or the theory of 
purchasing power parity (PPP), just to name a few. 

Ín short, provided we check that the residuals from regressions like (21.11.1) are /(Q) or stationary, 
the traditional regression methodology (including the ¢ and F tests) that we have considered exten- 
sively is applicable to data involving (nonstationary) time series. The valuable contribution of the 
concepts of unit root, cointegration, etc. is to force us to find out if the regression residuals are 
stationary. As Granger notes, “A test for cointegration can be thought of as a pre-test to avoid 
‘spurious regression’ situations.”*7 

In the language of cointegration theory, a regression such as Eq. (21.11.1) is known as a cointe- 
grating regression and the slope parameter B, is known as the cointegrating parameter. The 
concept of cointegration can be extended to a regression model containing k regressors. In this case 
we will have k cointegrating parameters. 


Testing for Cointegration 


A number of methods for testing cointegration have been proposed in the literature. We consider here a 
comparatively simple method, namely the DF or ADF unit root test on the residuals estimated from the 
cointegrating regression. 


42C, W. J. Granger, “Developments in the Study of Co-Integrated Economic Variables,” Oxford Bulletin of Economics and 
Statistics, vol. 48, 1986, p. 226. 

43There is this difference between tests for unit roots and tests for cointegration. As David A. Dickey, Dennis W. Jansen, and 
Daniel |. Thornton observe, “Tests for unit roots are performed on univariate [i.e., single] time series. In contrast, cointegra- 
tion deals with the relationship among a group of variables, where (unconditionally) each has a unit root.” See their article, 
“A Primer on Cointegration with an Application to Money and Income,” Economic Review, Federal Reserve Bank of St. Louis, 
March-April 1991, p. 59. As the name suggests, this article is an excellent introduction to cointegration testing. 
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Engle—Granger (EG) or Augmented Engle-Granger (AEG) Test 
We already know how to apply the DF or ADF unit root tests. All we have to do is estimate a regression 
like Eq. (21.11.1), obtain the residuals, and use the DF or ADF tests.44 There is one precaution to 
exercise, however. Since the estimated u, are based on the estimated cointegrating parameter 65, the DF 
and ADF critical significance values are not quite appropriate. Engle and Granger have calculated these 
values, which can be found in the references.” Therefore, the DF and ADF tests in the present context 
are known as Engle—Granger (EG) and augmented Engle—Granger (AEG) tests. However, several 
software packages now present these critical values along with other outputs. 

Let us illustrate these tests. Using the data introduced in Section 21.1 and found on the book’s website, 
we first regressed LPCEC on LDPIC and obtained the following regression: 


LPCE,= —0.1942 +  1.0114LDPI, 
t= (—8.2328) (348.5425) (21.11.3) 


R? = 0.9980 d= 0.1558 
Since LPCE and LDPI are individually nonstationary, there is the possibility that this regression is spurious. 
But when we performed a unit root test on the residuals obtained from Eq. (21.11.3), we obtained the 
following results: 
Aa, = —0.07644i,_ 
t = (—3.0458) ` (21.11.4) 


R? = 0.0369 gd =2.5389 


The Engle—Granger asymptotic 5 percent and 10 percent critical: values are about —3.34 and —3.04, 
respectively. Therefore, the residuals from the regression are not stationary at the 5 percent level. It 
would be difficult to accept this reason, for economic theory suggests that there should be a stable 
relationship between PCE and DPI. 

Let us reestimate Eq. (21.11.3) including the trend variable and then see if the residuals from this 
equation are stationary. We present the results first and then discuss what may be going on. 


LPCE, = 2.8168+ 0.0037, + 0.5844LDPI, 
t = (21.35117) (22.9395) (31.2754) mA (21.11.3a) 


R? = 0.9994 d = 0.2956 
To see if the residuals from this regression are stationary, we obtained the following results 
(compare with Eq. [21.11.4]): 
Au; = —0.1498%,_; 
t = (—4.4545) l (21.11.4a) 
R? = 0.0758 d = 2-395) 
Note: ui, is the residual from Eq. (21.11.3a). 


441f PCE and DPI are not cointegrated, any linear combination of them will be nonstationary and, therefore, the u, will 
also be nonstationary. 


GREF Engle and C. W. Granger, “Co-integration and Error Correction: Representation, Estimation and Testing,” Econo- 
metrica, vol. 55, 1987, pp. 251-276. 
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The DF test now shows that these residuals are stationary. Even if we use ADF with several lags, the 
residuals are still stationary. 

What is going on here’? Although the residuals from regression (21.11.4a) are stationary, that is, they are 
I(0), they are stationary around a deterministic time trend, the trend here being linear. That is, the residuals are 
1(Q) plus a linear trend. As noted earlier, a time series may contain both a deterministic and a stochastic trend. 

Before we proceed further, it should be noted that our time series data cover a long period of time (61 
years). It is quite possible that because of structural changes in the U.S. economy over this period, our results 
and conclusions are likely to differ. In Exercise 21.28 you are asked to check for this possibility. 


Cointegration and Error Correction Mechanism (ECM) 


We just showed that, allowing for the (linear) trend, LPCE and LDPI seem to be cointegrated, that is, there is 
a long-term, or equilibrium, relationship between the two. Of course, in the short-run there may be disequi- 
librium. Therefore, we can treat the error term in the following equation as the “equilibrium error.’ And we 
can use this error term to tie the short-run behavior of PCE to its long-run value: 


u, = LPCE, — fi — LLDPI — bt (21.11.5) 


The error correction mechanism (ECM) first used by Sargan** and later popularized by Engle 
and Granger corrects for disequilibrium. An important theorem, known as the Granger represen- 
tation theorem, states that if two variables Y and X are cointegrated, the relationship between the 
two can be expressed as ECM. To see what this means, let us revert to our PCE—DPI example. Now 
consider the following model: 


ALPCE, = œo + a; ALDPI, + a2u,_) + & (21.11.6) 


where e, is a white noise error term and u,_, is the lagged value of the error term in Eq. (21.11.5). 

ECMequation(21.11.5)statesthatALPCEdependsonALDPlandalsoontheequilibriumerrorterm.* 
If the latter is nonzero, then the model is out of equilibrium. Suppose ALDPI is zero and u,_, is positive. 
This means LPCE,_, is too high to be in equilibrium, that is, LPCE,_, is above its equilibrium value of 
(a + a,LDPI,_,). Since a, is expected to be negative, the term aju,_, is negative and, therefore, 
ALPCE, will be negative to restore the equilibrium. That is, if LPCE, is above its equilibrium 
value, it will start falling in the next period to correct the equilibrium error; hence the name ECM. 
By the same token, if u,_; is negative (i.e., LPCE is below its equilibrium value), a,u,_, will be 
positive, which will cause ALPCE, to be positive, leading LPCE, to rise in period t. Thus, the 
absolute value of a, decides how quickly the equilibrium is restored. In practice, we estimate 
Weep ise (LPCE, — Êi — Ê:LDPI — f3¢). Keep in mind that the error correction coefficient a, is 
expected to be negative (why?). 

Returning to our illustrative example, the empirical counterpart of Eq. (21.11.6) is: 


7 


ALPCE, = 0.0061 + 0.2967ALDPI,— 0.1223ů,_ı 
t= (9.6754) (6.2281) (—3.8461) (21.11.7) 
R2=0.1658 d=2.1496 


Statistically, the ECM term is significant, suggesting that PCE adjusts to DPI with a lag: only about 12 
percent of the discrepancy between long-term and short-term PCE is corrected within a quarter. 


46). D. Sargan, “Wages and Prices in the United Kingdom: A Study in Econometric Methodology,” in K. F. Wallis and 
D. F. Hendry, eds., Quantitative Economics and Econometric Analysis, Basil Blackwell, Oxford, U.K., 1984. 


47The following discussion is based on Gary Koop, op. cit., pp. 159-160 and Kerry Peterson, op. cit., Section 8.5. 
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From regression (21.11.7) we see that the short-run consumption elasticity is about 0.30. The long-run 


elasticity is about 0.58, which can be seen from Eq. (21.11.3a). 
Before we conclude this section, the caution sounded by S. G. Hall is worth remembering: 


While the concept of cointegration is clearly an important theoretical underpinning of the error correction 
model there are still a number of problems surrounding its practical application; the critical values and 
small sample performance of many of these tests are unknown for a wide range of models; informed 


inspection of the correlogram may still be an important tool.*® 


21.12 Some Economic Applications 


We conclude this chapter by considering some concrete examples. 


Example 21.1 M1 Monthly Money Supply in the United States, January 1959 to March 1, 2008 


Figure 21.11 shows the M1 money supply for the United States from January 1959 to March 1, 2008. From 
our knowledge of stationarity, it seems that the M1 money supply time series is nonstationary, which can 
be confirmed by unit root analysis. (Note: To save space, we have not given the actual data, which can be 
obtained from the Federal Reserve Board or the Federal Reserve Bank of St. Louis.) 


AM, = —0.1347 + 0.0293t— 0.0102M;_; : 
t=(-0.14) (2.62) (—2.30) Gh121) 
R2=0.0130 d=2.2325 


Money supply 


1 59 118 177 236 295 354 413 472 531 590 
Observation number 


Figure 21.11 U.S. money supply over 1959:01 to 2008:03. 


485, G. Hall, “An Application of the Granger and Engle Two-Step Estimation Procedure to the United Kingdom Aggregate 
Wage Data,” Oxford Bulletin of Economics and Statistics, vol. 48, no. 3, August 1986, p. 238. See also John Y. Campbell 
and Pierre Perron, “Pitfalls and Opportunities: What Macroeconomists Should Know about Unit Roots,” NBER (National 
Bureau of Economic Research), Macroeconomics Annual 1991, pp. 141-219. 
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The 1, 5, and 10 percent critical 7 values are -3.9811, -3.4210, and -3.1329. Since the t value of -2.30 is 
less negative than any of these critical values, the conclusion is that the M1 time series is nonstationary; that 
is, it contains a unit root or it is (1). Even when several lagged values of AM, (a la ADF) were introduced, the 
conclusion did not change. On the other hand, the first differences of the M1 money supply were found to 
be stationary (check this out). 


Example 21.2 The U.S./U.K. Exchange Rate: January 1971 to April 2008 


Figure 21.12 gives the graph of the ($/£) exchange rate from January 1971 to April 2008, for a total of 286 
observations. By now you should be able to spot this time series as non- stationary. Carrying out the unit root 
tests, we obtained the following 7 statistics: -0.82 (no intercept, no trend), -1.96 (intercept), and —1.33 
(intercept and trend). Each of these statistics, in absolute value, was less than its critical 7 value from the 
appropriate DF tables, thus confirming the graphical impression that the U.S./U.K. exchange rate time series 
is nonstationary. 
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Figure 21.12 U.S./U.K. exchange rate: January 1971 to April 2008. 
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Example 21.3 U.S. Consumer Price Index (CPI), January 1947 to March 2008 


Figure 21.13 shows the U.S. CPI from January 1947 to March 2008 for a total of 733 observations. The CPI 
series, like the M1 series considered previously, shows a sustained upward trend. The unit root exercise gave 
the following results: 


KCPit= —0.01082 + 0.00068t — 0.00096CPI,_; + 0.40669ACPl;-1 
Eyr (4.27) en (12.03) (21.12.2) 
R2?=0.3570 d= 1.9295 


The t ( = 7) value of CPI,_, is —1.77. The 10 percent critical value is -3.1317. Since, in absolute terms, 
the computed 7 is less than the critical r, the conclusion is that CPI is not a stationary time series. We 
can characterize it as having a stochastic trend (why?). However, if you take the first differences of 
the CPi series, you will find them to be stationary. Hence CPI is a difference-stationary (DS) time series. 
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Figure 21.13 U.S. CPI, January 1947 to March 2008. 


Example 21.4 Are 3-Month and 6-Month Treasury Bill Rates Cointegrated? 


Figure 21.14 plots (constant maturity) 3-month and 6-month U.S. Treasury bill (T-bill) rates from January 
1982 to March 2008, for a total of 315 observations. Does the graph show that the two rates are cointegrated; 
that is, is there an equilibrium relationship between the two? From financial theory, we would expect that to 
be the case, otherwise arbitrageurs will exploit any discrepancy between the short and the long rates. First of 
all, let us see if the two time series are stationary. 

On the basis of the pure random walk model (i.e., no intercept, no trend), both the rates were stationary. 
Including intercept, trend, and one lagged difference, the results suggested that the two rates might be trend 
stationary; the trend coefficient in both cases was negative and significant at about the 7 percent level. So, 
depending on which results we accept, the two rates are either stationary or trend stationary. 


16 
14 
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Rate, % 
oo 


0 
1982 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 
Year 
Figure 21.14 Three- and six-month Treasury bill rates (constant maturity). 
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Regressing the 6-month T-bill rate (TB6) on the 3-month T-bill rate, we obtained the following 
regression. 


TB6, = 0.0842 + 1.0078TB3, 
t = (3.65) (252.39) (21.12.3) 
R2—=0.995 d= 0.4035 


Applying the unit root test to the residuals from the preceding regression, we found that the 
residuals were stationary, suggesting that the 3- and 6-month T-bill rates were cointegrated. Using 
this knowledge, we obtained the following error correction model (ECM): 


ATB6; = —0.0047 + 0.8992ATB3,— 0.18550: 
t = (—0.82) (47.77) (—5.69) (21.12.4) 
R? = 0.880 d = 1.5376 - 
where G_; is the lagged value of the error correction term from the preceding period. As these results 
show, 0.19 of the discrepancy in the two rates in the previous month is eliminated this month.*? Besides, 
short-run changes in the 3-month T-bill rate are quickly reflected in the 6-month T-bill rate, as the slope 


coefficient between the two is 0.8992. This should not be a surprising finding in view of the efficiency 
of the U.S. money markets. 


Summary and Conclusions 


1. Regression analysis based on time series data implicitly assumes that the underlying time series are 
stationary. The classical f tests, F’ tests, etc., are based on this assumption. 

2. In practice most economic time series are nonstationary. 

3. A stochastic process is said to be weakly stationary if its mean, variance, and autocovariances are 
constant over time (i.e., they are time-invariant). 

4. At the informal level, weak stationarity can be tested by the correlogram of a time series, which is 
a graph of autocorrelation at various lags. For stationary time series, the correlogram tapers off quickly, 
whereas for nonstationary time series it dies off gradually. For a purely random series, the autocorrela- 
tions at all lags 1 and greater are zero. 

5. Atthe formal level, stationarity can be checked by finding out if the time series contains a unit root. The 
Dickey—Fuller (DF) and augmented Dickey—Fuller (ADF) tests can be used for this purpose. 

6. An economic time series can be trend stationary (TS) or difference stationary (DS). A TS time 
series has a deterministic trend, whereas a DS time series has a variable, or stochastic, trend. The common 
practice of including the time or trend variable in a regression model to detrend the data is justifiable 
only for TS time series. The DF and ADF tests can be applied to determine whether a time series is TS 
or DS. 

7. Regression of one time series variable on one or more time series variables often can give nonsensical 
or spurious results. This phenomenon is known as spurious regression. One way to guard against it is to 
find out if the time series are cointegrated. 


49since both T-bill rates are in percent form, this would suggest that if the 6-month TB rate was higher than the 
3-month TB rate more than expected a priori in the last month, this month it will be reduced by 0.19 percentage points 
to restore the long-run relationship between the two interest rates. For the underlying theory about the relationship 
between short- and long-run interest rates, see any money and banking textbook and read up on the term structure 
of interest rates. 
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Cointegration means that despite being individually nonstationary, a linear combination of two or more time 
series can be stationary. The Engle-Granger (EG) and the augmented Engle-Granger (AEG) tests can be 
used to find out if two or more time series are cointegrated. 

Cointegration of two (or more) time series suggests that there is a long-run, or equilibrium, relationship 
between them. 

The error correction mechanism (ECM) developed by Engle and Granger is a means of reconciling the 
short-run behavior of an economic variable with its long-run behavior. 

The field of time series econometrics is evolving. The established results and tests are in some cases tentative and 
a lot more work remains. An important question that needs an answer is why some economic time series are 
stationary and others are nonstationary. 


Multiple Choice Questions 


. Sometimes we expect no relationship between two variables, yet regressing one time series variable 


on another. It often results in 

a. Cause and effect relationship indication 
b. High R? in excess of 0.9 

c. Stationary series 

d. Non-stationary series 


. Regressing a time series variable on another time series variable, which have no meaningful 


relationship may often show a significant relationship. Such regression is known as 
a. Stationary series 

b. White noise 

c. Spurious regression 

d. Random walk phenomenon 


. Acollection of random variables ordered in time is known as 


a. Stationary series 
b. Stochastic processes 
c. Spurious variable 
d. Non-stationary series 


. In time series data analysis inferences about the underlying stochastic process is drawn from 


a. Realization of that process 
b. Sample data 

c. Population data 

d. Stationary data 


. A stochastic process whose mean, variance and autocovariance are constant over time is known as 


a. Trend stationary 

b. Difference stationary 
c. Weakly stationary 

d. Strictly stationary 


- A time series with all the moments of its probability distribution being invariant over time is known as 


a. Trend stationary 

b. Difference stationary 
c. Weakly stationary 

d. Strictly stationary 


iT. 


2. 


13. 


14. 


i. 
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A nonstationary time series is one with 
a. Time-varying mean 

b. Time-varying variance 

c. Both (a) and (b) 

d. Time invariant mean and variance 


. We can study the behavior of this time series data only for the time period under consideration and 


cannot generalize it to other time periods. Such time series data is known as 
a. Weakly stationary 

b. Strictly stationary 

c. Stationary series 

d. Nonstationary series 

A purely random process is a stationary series with 

a. Zero variance 

b. Zero mean 

c. Positive mean 

d. Zero mean and zero variance 

A white noise process is a stochastic process with 

a. Zero mean 

b. Constant variance 

c. Serially uncorrelated error term 

d. All of the above 

In random walk without drift, 

a. The effect of shock persists throughout the time period 

b. The effect of shock in the past dies out over time 

c. The effect of shock drifts away quickly 

d. There is no effect of past shock 

Which of the following time series processes is said to have infinite memory 
a. Random walk with drift 

b. Random walk without drift 

c. Both (a) and (b) 

d. Neither (a) nor (b) 

A series that is inherently non-stationary is 

a. Random walk with drift 

b. Random walk without drift 

c. Both (a) and (b) 

d. Neither (a) nor (b) 

A series that is an example for difference stationary process is 
a. Random walk with drift 

b. Random walk without drift 

c. Both (a) and (b) 

d. Neither (a) nor (b) 

A non-stationary series that becomes stationary on fiist differencing the series twice is a series that is 
a. Integrated of order 0 

b. Integrated of order 1 

c. Integrated of order 2 

d. Integrated of order 3 
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16. According to Granger and Newbold, a good rule of thumb to suspect that the estimated regression is 
spurious is given by 
a. High R? with low t-values (R? >t) 
b. High t values with low Durbin—Watson d value (t > d) 
c. High R? and low Durbin—Watson d value (R? > d) 
d. High R? and low F value (R? > F) 
17. The autocorrelation function at lag k can take values between 
a. O and 1 
b. -1 and 0 
c. —l and +1 
d. 0to2 
18. Test statistics used to test stationary of a time series is 
a. Dickey—Fuller test 
b. Engle—Granger test 
c. Error correction mechanism 
d. Augmented Dickey—Fuller test 
19. Test statistics used to test stationarty of a time series in the presence of correlated error term is 
a. Dickey-Fuller test 
b. Engle-Granger test 
c. Error correction mechanism 
d. Augmented Dickey-Fuller test 
20. Most unit root tests are based on the Ho that the time series under consideration is 
a. Stationary 
b. Non-stationary 
c. Strictly stationary 
d. Weak stationary 
21. A time series that has a unit root can be made stationary by 
a. Detrending the time series 
b. First differencing the time series 
c. Either (a) or (b) 
d. Neither (a) nor (b) 
22. A trend stationary process requires regressing the time series on ~ 
a. Its past values 
b. Other explanatory variables 
c. Time 
d. Its error terms 
23. Ifa time series is difference-stationary process but we treat it as trend-stationary process, this is called 
a. Under-differencing 
b. Over-differencing 
c. Random walk 
d. Specification error 
24. Ifa time series is trend-stationary process but we treat it as difference-stationary process, then it is called 
a. Under-differencing 
b. Over-differencing 
c. Random walk 
d. Specification error 
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25. Though two time series are individually non-stationary, their linear combination is stationary. This is 
an example of two variables 
a. With random walk 
b. Spurious regression 
c. Being cointegrated 
d. With trend stationary 
26. A series may be trend stationary or difference stationary. Test statistics used to distinguish the two is 
a. Dickey—Fuller test 
b. Engle—Granger test 
c. Error correction mechanism 
d. F-test 
27. Testing for cointegration is given by 
a. Dickey—Fuller test 
b. Engle—Granger test 
c. Error correction mechanism 
d. F-test 
28. A cointegration of two or more time series suggests that a equilibrium relationship between them 
exists that is of 
a. Long-run 
b. Short-run 
c. Very short-run 
d. Either (a) or (b) depending on the number of lags 
29. This method corrects for disequilibrium is the time series 
a. Dickey—Fuller test 
b. Engle—Granger test 
c. Error correction mechanism 
d. F-test 
30. The Granger representation theorem states that if two variables Y and X are cointegrated, the 
relationship between the two can be expressed as 
a. Trend stationary series 
b. First-differencing stationary series 
c. Unit root series 
d. Error correction mechanism 


Exercises 


Questions 


21.1. What is meant by weak stationarity? 

21.2. What is meant by an integrated time series? 

21.3. What is the meaning of a unit root? 

21.4. Ifa time series is /(3), how many times would you have to difference it to make it stationary? 
21.5. What are Dickey—Fuller (DF) and augmented DF tests? 

21.6. What are Engle-Granger (EG) and augmented EG tests? 

21.7. What is the meaning of cointegration? 
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What is the difference, if any, between tests of unit roots and tests of cointegration? 

What is spurious regression? 

What is the connection between cointegration and spurious regression? 

What is the difference between a deterministic trend and a stochastic trend? 

What is meant by a trend-stationary process (TSP) and a difference-stationary process (DSP)? 
What is a random walk (model)? 

“For a random walk stochastic process, the variance is infinite.’ Do you agree? Why? 

What is the error correction mechanism (ECM)? What is its relationship with-cointegration? 


Empirical Exercises 


ZUG. 


AR 


2118: 


At» 


2120; 
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Using the U.S. economic time series data posted on the book’s website, obtain sample correlograms up 
to 36 lags for the time series LPCE, LDPI, LCP(profits), and LDIVIDENDS. What general pattern do you 
see? Intuitively, which one(s) of these time series seems to be stationary? 

For each of the time series of Exercise 21.16, use the DF test to find out if these series contain a unit root. 
If a unit root exists, how would you characterize such a time series? 

Continue with Exercise 21.17. How would you decide if the ADF test is more appropriate than the DF 
test? 

Consider the dividends and profits time series given in the U.S. economic time series data posted 
on the book’s website. Since dividends depend on profits, consider the following simple model: 


LDIVIDENDS, = fı + B2LCP + u; 
a. Would you expect this regression to suffer from the spurious regression phenomenon? Why? 
. Are the logged Dividends and logged Profits time series cointegrated? How do you test for this 
explicitly? If, after testing, you find that they are cointegrated, would your answer in (a) change? 
c. Employ the error correction mechanism (ECM) to study the short- and long-run behavior of dividends 
in relation to profits. 
d. If you examine the LDIVIDENDS and LCP series individually, do they exhibit stochastic or deter- 
ministic trends? What tests do you use? 
e. Assume LDIVIDENDS and LCP are cointegrated. Then, instead of regressing dividends on 
profits, you regress profits on dividends. Is such a regression valid? 
Take the first differences of the time series given in the U.S. economic time series data posted on 
the book’s website and plot them. Also obtain a correlogram of each time series = to 36 lags. What 
strikes you about these correlograms? 
Instead of regressing LDIVIDENDS on LCP in level form, suppose you regress the first difference of 
LDIVIDENDS on the first difference of LCP. Would you include the intercept in this regression? Why or 
why not? Show the calculations. 
Continue with the previous exercise. How would you test the first-difference regression for station- 
arity? In the present example, what would you expect a priori and why? Show all the calculations. 
From the U.K. private sector housing starts (X) for the period 1948 to 1984, Terence Mills obtained the 
following regression results:' 


SE 


* 


AX, = 31.03 — 0 1:2 
se= (12.50) (0.080) 
(t=)r (—2.35) 


“Optional. 
TTerence C. Mills, op. cit., p. 127. Notation slightly altered. 
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Note: The 5 percent critical r value is —2.95 and the 10 percent critical 7 value is —2.60. 

a. On the basis of these results. is the housing starts time series stationary or nonstationary? Alternatively, is 
there a unit root in this time series? How do you know? 

b. If you were to use the usual f test, is the observed z value statistically significant? On this basis, would 
you have concluded that this time series is stationary? 

c. Now consider the following regression results: 


PAG = 139AN, ek 0.313A2%,, 
se = (5.06) (0.236) (0.163) 


i= Yia (—5.89) 

where A7 is the second difference operator. that is, the first difference of the first difference. The 
estimated 7 value is now statistically significant. What can you say now about the stationarity of the 
time series in question? 
Note: The purpose of the preceding regression is to find out if there is a second unit root in the time 
series. 

21.24. Generate two random walk series as indicated in Eqs. (21.7.1) and (21.7.2) and regress one on the 
other. Repeat this exercise but now use their first differences and verify that in this regression the R? 
value is about zero and the Durbin—Watson d is close to 2. 

21.25. To show that two variables, each with deterministic trend, can lead to spurious regression, Charemza 
et. al. obtained the following regression based on 30 observations:" 


f, = 5.92 + 0.030X, 
t=(9.9) (21.2) 
R?=0.92 d=0.06 


where Y,=1, Y>=2,..., Y, =n and X,=1, X,=4,...,X, =n’. 
a. What kind of trend does Y exhibit? and X? 
b. Plot the two variables and plot the regression line. What general conclusion do you draw from this 


21.26. —_ data for the period 197 1-I to 1988-IV for Canada, the following regression results were obtained: 
1. inMi,= —10.2571 + 1.5975 In GDP, 
t = (—12.9422) (25.8865) 
R? = 0.9463 d= 0.3254 
2 AlnMl,= 0.0095 + 0.5833 InGDP, 
t = (2.4957) (1.8958) 
R? = 0.0885 deal 7399 
3. Aû,= —0.1958%,_1 


(t = 1) (—2.2521) 
R?=0.1118  d=1.4767 


“Charemza et al., op. cit., p. 93. 
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where M1 = M1 money supply, GDP = gross domestic product, both measured in billions of Canadian 
dollars, In is natural log, and i, represent the estimated residuals from regression (1). 

a. Interpret regressions (1) and (2). 

b. Do you suspect that regression (1) is spurious? Why? 

c. Is regression (2) spurious? How do you know? 

d. From the results of regression (3), would you change your conclusion in (b)? Why? 

e. Now consider the following regression: 


AlnM1,= 0.0084 + 0.7340AInGDP,— 0.08114,-1 
t = (2.0496) (2.0636) (—0.8537) 
R2=0.1066 d= 1.6697 


What does this regression tell you? Does this help you decide if regression (1) is spurious or not? 
The following regressions are based on the CPI data for the United States for the period 1960-2007, 
for a total of 48 annual observations: 


il ACPI,= 0.0334CPI,_, 
(1287) 


R2=0.0703  d=0.3663 RSS = 206.65 
2. ACPI, = 1.8662 + 0.0192CPI,; 
1S2 G86) 
R? = 0.249 d=0.4462 RSS = 166.921 
3. ACPI= 1.1611 + 0.5344r— 0.1077CPly 


(37) (480) (SA 
R2=0.507 d=0.6071 RSS = 109.608 


t 


i 


where RSS = residual sum of squares. 

a. Examining the preceding regressions, what can you say about stationarity of the CPI time series? 

b. How would you choose among the three models? a 

c. Equation (1) is Eq. (3) minus the intercept and trend. Which test would you use to decide if the implied 
restrictions of model (1) are valid? (Hint: Use the Dickey—Fuller t and F tests. Use the approximate 
values given in Appendix D, Table D.7.) 

As noted in the text, there may be several structural breaks in the U.S. economic time series dataset 

introduced in Section 21.1. Dummy variables are a good way of incorporating these shifts in the data. 

a. Using dummy variables to designate three different periods related to the oil embargoes in 1973 and 
1979, regress the log of personal consumption expenditures (LPCE) on the log of disposable personal 
income (LDPI). Has there been a change in the results? What is your decision about the unit root 
hypothesis now? 

b. Several websites list the official economic cycles that may have affected the U.S. economic time 
series data discussed in Section 21.1. See, for example, http://www.nber.org/cycles/cyclesmain.html. 
Using the information here, create dummy variables indicating some of the major cycles and check 
the results of regressing LPCE on LDPI. Has there been a change? 
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Key to Multiple Choice Questions 


1. (b) 2. (c) 3. (b) 4. (a) 5C) 6. (d) TRC) 8. (d) 9. (b) 
10. (d) 11. (a) 12. (c) IS (c) 14. (c) 15. (c) 16. (c) 17. (c) 18. (a) 
19. (d) 20. (b) 21. (b) 22: (c) 23. (a) 24. (b) 25. (© 26. (a) Zr (D) 
28. (a) 29. (c) 30. (d) 
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Time Series Econometrics: 
Forecasting 


We noted in the Introduction that forecasting is an important part of econometric analysis, for some people 
probably the most important. How do we forecast economic variables, such as GDP, inflation, exchange 
rates, stock prices, unemployment rates, and myriad other economic variables? In this chapter we discuss 
two methods of forecasting that have become quite popular: (1) autoregressive integrated moving average 
(ARIMA), popularly known as the Box—Jenkins methodology,' and (2) vector autoregression (VAR). 

In this chapter we also discuss the special problems involved in forecasting prices of financial assets, 
such as stock prices and exchange rates. These asset prices are characterized by the phenomenon known as 
volatility clustering, that is, periods in which they exhibit wide swings for an extended time period followed 
by a period of comparative tranquility. One only has to look at the Dow Jones Index in the recent past. The 
so-called autoregressive conditional heteroscedasticity (ARCH) or generalized autoregressive condi- 
tional heteroscedasticity (GARCH) models can capture such volatility clustering. 

The topic of economic forecasting is vast, and specialized books have been written on this subject. Our 
objective in this chapter is to give the reader just a glimpse of this subject. The interested reader may consult 
the references for further study. Fortunately, most modern econometric packages have user-friendly introduc- 
tions to several techniques discussed in this chapter. 

The linkage between this chapter and the previous chapter is that the forecasting methods discussed below 
assume that the underlying time series are stationary or they can be made stationary with appropriate transfor- 
mations. As we progress through this chapter, you will see the use of the several concepts that we introduced 
in the last chapter. 


22.1 Approaches to Economic Forecasting 
Broadly speaking, there are five approaches to economic forecasting based on time series data: (1) exponential 


smoothing methods, (2) single-equation regression models, (3) simultaneous-equation regression models, 
(4) autoregressive integrated moving average (ARIMA) models, and (5) vector autoregression (VAR) models. 


1G. P. E. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, revised ed., Holden Day, San Francisco, 1978. 
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Exponential Smoothing Methods? 


These are essentially methods of fitting a suitable curve to historical data of a given time series. There are a 
variety of these methods, such as single exponential smoothing, Holt’s linear method, Holt—Winters’ method, 
and their variations. Although still used in several areas of business and economic forecasting, these are 
now supplemented (supplanted?) by the other four methods that follow. We will not discuss exponential 
smoothing methods in this chapter, for that would take us far afield. 


Single-Equation Regression Models 


The bulk of this book has been devoted to single-equation regression models. As an example of a single- 
equation model, consider the demand function for automobiles. On the basis of economic theory, we postulate 
that the demand for automobiles is a function of automobile prices, advertising expenditure, income ofthe 
consumer, interest rate (as a measure of the cost of borrowing), and other relevant variables (e.g., family size, 
travel distance to work). From time series data, we estimate an appropriate model of auto demand (either 
linear, log-linear, or nonlinear), which can be used for forecasting demand for autos in the future. Of course, 
as noted in Chapter 5, forecasting errors increase rapidly if we go too far out in the future. 


Simultaneous-Equation Regression Models? 


In Chapters 18, 19, and 20 we considered simultaneous-equation models. In their heyday during the 1960s 
and 1970s, elaborate models of the U.S. economy based on simultaneous equations dominated economic 
forecasting. But since then the glamor of such forecasting models has subsided because of their poor 
forecasting performance, especially since the 1973 and 1979 oil price shocks (due to OPEC oil embargoes) 
and also because of the so-called Lucas critique.’ The thrust of this critique, as you may recall. is that the 
parameters estimated from an econometric model are dependent on the policy prevailing at the time the 
model was estimated and will change if there is a policy change. In short, the estimated parameters are not 
invariant in the presence of policy changes. 

For example, in October 1979 the Fed changed its monetary policy dramatically. Instead of targeting 
interest rates, it announced it would henceforth monitor the rate of growth of the money supply. With such a 
pronounced change, an econometric model estimated from past data will have little forecasting value in the 
new regime. These days the Fed’s emphasis has changed from controlling the money supply to controlling 
the short-term interest rate (the federal funds rate). 


ARIMA Models 


The publication by Box and Jenkins of Time Series Analysis: Forecasting and Control (op. cit.) ushered in a 
new generation of forecasting tools. Popularly known as the Box—Jenkins (BJ) methodology, but technically 
known as the ARIMA methodology, the emphasis of these methods is not on constructing single-equation or 


For a comparatively simple exposition of these methods, see Spyros Makridakis, Steven C. Wheelwright, and Rob J. Hynd- 
man, Forecasting Methods and Applications, 3d ed., John Wiley & Sons, New York, 1998. 

3For a textbook treatment of the use of simultaneous-equation models in forecasting, see Robert S. Pindyck and Daniel L. 
Rubinfeld, Econometric Models & Economic Forecasts, 4th ed., McGraw-Hill, New York, 1998, Part Ill. 

4Robert E. Lucas, “Econometric Policy Evaluation: A Critique,” in Carnegie—Rochester Conference Series, The Phillips Curve, 
North-Holland, Amsterdam, 1976, pp. 19-46. This article, among others, earned Lucas a Nobel Prize in economics. 
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simultaneous-equation models but on analyzing the probabilistic, or stochastic, properties of economic time 
series on their own under the philosophy let the data speak for themselves. Unlike the regression models, 
in which Y, is explained by k regressors X}, X>, X3,..., Xp the BJ-type time series models allow Y, to be 
explained by past, or lagged, values of Y itself and stochastic error terms. For this reason, ARIMA models are 
sometimes called atheoretic models because they are not derived from any economic theory—and economic 
theories are often the basis of simultaneous-equation models. 

In passing, note that our emphasis in this chapter is on univariate ARIMA models, that is, ARIMA models 
pertaining to a single time series. But the analysis can be extended to multivariate ARIMA models. 


VAR Models 


VAR methodology superficially resembles simultaneous-equation modeling in that we consider several 
endogenous variables together. But each endogenous variable is explained by its lagged. or past, values and 
the lagged values of all other endogenous variables in the model; usually, there are no exogenous variables 
in the model. 

In the rest of this chapter we discuss the fundamentals of Box—Jenkins and VAR approaches to economic 
forecasting. Our discussion is elementary and heuristic. The reader wishing to pursue this subject further is 
advised to consult the references.> 


22.2 AR, MA, and ARIMA Modeling of Time Series Data 


To introduce several ideas, some old and some new, let us work with the GDP time series data for the United 
States introduced in Section 21.1 (see the book’s website for the actual data). A plot of this time series is 
already given in Figures 21.1 (undifferenced logged GDP) and 21.9 (first-differenced LGDP): recall that 
LGDP in level form is nonstationary but in the (first) differenced form it is stationary. 

If a time series is stationary, we can model it in a variety of ways. 


An Autoregressive (AR) Process 


Let Y, represent the logged GDP at time ż. If we model Y, as 
(Y, — ô) = a,(%-1 — 8) + uy x (22.2.1) 


where 6 is the mean of Y and where u, is an uncorrelated random error term with zero mean and constant 
variance g? (i.e., it is white noise), then we say that Y, follows a first-order autoregressive, or AR(1), 
stochastic process, which we have already encountered in Chapter 12. Here the value of Y at time t depends 
on its value in the previous time period and a random term; the Y values are expressed as deviations from their 
mean value. In other words, this model says that the forecast value of Y at time t is simply some proportion (= 
a) of its value at time (z — 1) plus a random shock or disturbance at time t: again the Y values are expressed 
around their mean values. 
But if we consider this model, 


(Y, — 8) = a1(%_1 — 8) +a3(¥,-2 — 8) + u; (22.2.2) 


See Pindyck and Rubinfeld, op. cit., Part 3; Alan Pankratz, Forecasting with Dynamic Regression Models, John Wiley & Sons, 
New York, 1991 (this is an applied book); and Andrew Harvey, The Econometric Analysis of Time Series, The MIT Press, 2d 
ed., Cambridge, Mass., 1990 (this is a rather advanced book). A thorough but accessible discussion can also be found in 
Terence C. Mills, Time Series Techniques for Economists, Cambridge University Press, New York, 1990. 
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then we say that Y, follows a second-order autoregressive, or AR(2), process. That is, the value of Y at time t 
depends on its value in the previous two time periods, the Y values being expressed around their mean value 6. 
In general, we can have 


(% — 6) = ay (Y,-1 — 8) + o(¥,_2 — 8) + --- +0,(¥_p — 8) + u, (22.2.3) 
in which case Y, is a pth-order autoregressive, or AR(p), process. 
Notice that in all the preceding models only the current and previous Y values are involved; there are no 
other regressors. In this sense, we say that the “data speak for themselves.” They are a kind of reduced form 
model that we encountered in our discussion of the stmultaneous-equation models. 


A Moving Average (MA) Process 


The AR process just discussed is not the only mechanism that may have generated Y. Suppose we model Y 
as follows: 


Y, = u + Pou: + Biur- (22.2.4) 


where u is a constant and u. as before, is the white noise stochastic error term. Here Y at time t is equal to a 
constant plus a moving average of the current and past error terms. Thus, in the present case, we say that Y 
follows a first-order moving average, or an MA(1), process. 

But if Y follows the expression 


Y, = u + Bou; + Bitr_1 + Bou;-2 (22.2.5) 
then it is an MA(2) process. More generally, 
Y, = pu + Bou; + Bitty) + Bouy_2 +--+ + Bgttr—g (22.2.6) 


is an MA(Q) process. In short, a moving average process is simply a linear combination of white noise error 
terms. 


An Autoregressive and Moving Average (ARMA) Process 


Of course, it is quite likely that Y has characteristics of both AR and MA and is therefore ARMA. Thus, Y, 
follows an ARMA(1, 1) process if it can be written as 


Y, = 0 + a1 Y; + Bou + Piu (222-7) 


because there is one autoregressive and one moving average term. In Eq. (22.2.7) 6 represents a constant term. 
In general, in an ARMA(p, q) process, there will be p autoregressive and q moving average terms. 


An Autoregressive Integrated Moving Average (ARIMA) Process 


The time series models we have already discussed are based on the assumption that the time series involved 
are (weakly) stationary in the sense defined in Chapter 21. Briefly, the mean and variance for a weakly 
stationary time series are constant and its covariance is time-invariant. But we know that many economic 
time series are nonstationary, that is, they are integrated; for example, the economic time series introduced in 
Section 21.1 of Chapter 21 are integrated. 

But we also saw in Chapter 21 that if a time series is integrated of order | (i.e., it is /{1]), its first differ- 
ences are /(0), that is, stationary. Similarly, if a time series is /(2), its second difference is /(Q). In general, if 
a time series is /(d), after differencing it d times we obtain an /(0) series. 
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Therefore, if we have to difference a time series d times to make it stationary and then apply the ARMA(p, 
q) model to it, we say that the original time series is ARIMA (p, d, q), that is, it is an autoregressive integrated 
moving average time series, where p denotes the number of autoregressive terms, d the number of times the 
series has to be differenced before it becomes stationary, and g the number of moving average terms. Thus, 
an ARIMA(2, 1, 2) time series has to be differenced once (d = 1) before it becomes stationary and the (first- 
differenced) stationary time series can be modeled as an ARMA(2, 2) process, that is, it has two AR and two 
MA terms. Of course, if d = 0 (i.e., a series is stationary to begin with), ARIMA(p, d = 0, q4) = ARMA (p, q). 
Note that an ARIMA(p, 0, 0) process means a purely AR(p) stationary process; an ARIMA(O, 0, q) means a 
purely MA(q) stationary process. Given the values of p, d, and q, one can tell what process is being modeled. 
The important point to note is that to use the Box—Jenkins methodology, we must have either a stationary 
time series or a time series that is stationary after one or more differencings. The reason for assuming station- 
arity can be explained as follows: 
The objective of B-J [Box—Jenkins] is to identify and estimate a statistical model which can be interpreted as 
having generated the sample data. If this estimated model is then to be used for forecasting we must assume that 
the features of this model are constant through time, and particularly over future time periods. Thus the simple 
reason for requiring stationary data is that any model which is inferred from these data can itself be interpreted as 
stationary or stable, therefore providing [a] valid basis for forecasting. 


22.3 The Box—Jenkins (BJ) Methodology 


The million-dollar question obviously is: Looking at a time series, such as the U.S. LGDP series in Figure 
21.1, how does one know whether it follows a purely AR process (and if so, what is the value of p) or a purely 
MA process (and if so, what is the value of q) or an ARMA process (and if so, what are the values of p and 
q) or an ARIMA process, in which case we must know the values of p, d, and q. The BJ methodology comes 
in handy in answering the preceding question. The method consists of four steps: 


Step 1. Identification. That is, find out the appropriate values of p, d, and q. We will show shortly how the 
correlogram and partial correlogram aid in this task. 

Step 2. Estimation. Having identified the appropriate p and q values, the next stage is to estimate the 
parameters of the autoregressive and moving average terms included in the model. Sometimes this calcu- 
lation can be done by simple least squares but sometimes we will have to resort to nonlinear (in parameter) 
estimation methods. Since this task is now routinely handled by several statistical packages, we do not 
have to worry about the actual mathematics of estimation; the enterprising student may consult the refer- 
ences on that. 

Step 3. Diagnostic checking. Having chosen a particular ARIMA model, and having estimated its param- 
eters, we next see whether the chosen model fits the data reasonably well, for it is possible that another 
ARIMA model might do the job as well. This is why Box—Jenkins ARIMA modeling is more an art than 
a science; considerable skill is required to choose the right ARIMA model. One simple test of the chosen 
model is to see if the residuals estimated from this model are white noise; if they are, we can accept the 


particular fit; if not, we must start over. Thus, the BJ methodology is an iterative process (see Figure 
22.1). 


Step 4. Forecasting. One of the reasons for the popularity of the ARIMA modeling is its success in 
forecasting. In many cases, the forecasts obtained by this method are more reliable than those obtained 
from the traditional econometric modeling, particularly for short-term forecasts. Of course, each case 
must be checked. 


6Michael Pokorny, An Introduction to Econometrics, Basil Blackwell, New York, 1987, p. 343. 
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1. Identification of the model 
(Choosing tentative p, d, q) 
=| 
2. Parameter estimation of 2 
the chosen model 


3. Diagnostic checking: 

Are the estimated residuals white noise? 

Yes No 
(Go to Step 4) (Return to Step 1) 
| 4. Forecasting 


Figure 22.1 The Box—Jenkins methodology. 


With this general discussion, let us look at these four steps in some detail. Throughout, we will use the 
GDP data introduced in Section 21.1 (see the book's website for the actual data) to illustrate the various 
points. 


22.4 Identification 


The chief tools in identification are the autocorrelation function (ACF), the partial autocorrelation 
function (PACF), and the resulting correlograms, which are simply the plots of ACFs and PACFs against 
the lag length. 

In the previous chapter we defined the (population) ACF (p,) and the sample ACF (6;). The concept of 
partial autocorrelation is analogous to the concept of partial regression coefficient. In the k-variable multiple 
regression model, the kth regression coefficient B, measures the rate of change in the mean value of the 
regressand for a unit change in the kth regressor X,, holding the influence of all other regressors constant. 

In similar fashion, the partial autocorrelation p,, measures correlation between (time series) observa- 
tions that are k time periods apart after controlling for correlations at intermediate lags (i.e., lags less than k). 
In other words, partial autocorrelation is the correlation between Y, and Y,_, after removing the effect of the 
intermediate Y’s.’ In Section 7.11 we already introduced the concept of partial correlation in the regression 
context and showed its relation to simple correlations. Such partial correlations are now routinely computed 
by most statistical packages. 

In Figure 22.2 we show the correlogram (panel a) and partial correlogram (panel b) of the LGDP series. 
From this figure, two facts stand out: First, the ACF declines very slowly; as shown in Figure 21.8, ACF up 
to about 22 lags are individually statistically significantly different from zero, for they all are outside the 95 
percent confidence bounds. Second, after the second lag, the PACF drops dramatically, and most PACFs after 
lag 2 are statistically insignificant, save for maybe lag 13. 

Since the U.S. LGDP time series is not stationary, we have to make it stationary before we can apply 
the Box—Jenkins methodology. In Figure 21.9 we plotted the first differences of LGDP. Unlike Figure 21.1, 
we do not observe any trend in this series, perhaps suggesting that the first-differenced LGDP time series 


in time series data a large proportion of correlation between Y,and Y,_, may be due to the correlations they have with the 
intervening lags Y;_1, Y-2r+- Y1: The partial correlation p,, removes the influence of these intervening variables. 
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Figure 22.2 (a) Correlogram and (4) partial correlogram, for LGDP, United States, 1947-1 to 2007-IV. 


is stationary.® A formal application of the Dickey—Fuller unit root test shows that that is indeed the case. 
We can also see this visually from the estimated ACF and PACF correlograms given in panels (a) and (b) of 
Figure 22.3. Now we have a much different pattern of ACF and PACF. The ACFs at lags 1, 2, and 5 seem 
statistically different from zero; recall from Chapter 21 that the approximate 95 percent confidence limits for 
p; are ~0.1254 and +0.1254. (Note: As discussed in Chapter 21, these confidence limits are asymptotic and 


®it is hard to tell whether the variance of this series is stationary, especially around 1979-1980. The oil embargo of 1979 and 
a significant change in the Fed’s monetary policy in 1979 may have something to do with our difficulty. 
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Figure 22.3 (a) Correlogram and (b) partial correlogram for first differences of LGDP, United States, 1947-1 to 2007-IV. 


so can be considered approximate.) But at all other lags, they are not statistically different from zero. For the 
partial autocorrelations, only lags | and 12 seem to be statistically different from zero. 

Now how do the correlograms given in Figure 22.3 enable us to find the ARMA pattern of the LGDP time 
series? (Note: We will consider only the first-differenced LGDP series because it is stationary.) One way of 
accomplishing this is to consider the ACF and PACF and the associated correlograms of a selected number of 
ARMA processes, such as AR(1), AR(2), MA(1), MA(2), ARMA(I, 1), ARIMA(2, 2), and soon. Since each 
of these stochastic processes exhibits typical patterns of ACF and PACF, if the time series under study fits one 
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of these patterns we can identify the time series with that process. Of course, we will have to apply diagnostic 
tests to find out if the chosen ARMA model is reasonably accurate. 

To study the properties of the various standard ARIMA processes would consume a lot of space. What 
we plan to do is to give general guidelines (see Table 22.1); the references can give the details of the various 
stochastic processes. 


Table 22.1 Theoretical Patterns of ACF and PACF 


Type of Model Typical Pattern of ACF Typical Pattern of PACF 

AR(p) Decays exponentially or with Significant spikes through 
damped sine wave pattern or both lags p 

MA(q) Significant spikes through lags q Declines exponentially 

ARMA(p, q) Exponential decay Exponential decay 


Note: The terms exponential and geometric decay mean the same things (recall our discussion of the Koyck distributed lag). 


Notice that the ACFs and PACFs of AR(p) and MA(q) processes have opposite patterns; in the AR(p) case 
the AC declines geometrically or exponentially but the PACF cuts off after a certain number of lags, whereas 
the opposite happens to an MA(q) process. 

Geometrically, these patterns are shown in Figure 22.4. 
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Figure 22.4 ACF and PACF of selected stochastic processes: (a) AR(2): a, = 0.5, a, = 0.3; (6) MAQ): B, = 0.5, B, = 0.3; 
(©) ARMA(1, 1): a, = 0.5, B, = 0.5. . 


A Warning 


Since in practice we do not observe the theoretical ACFs and PACFs and rely on their sample counterparts, 
the estimated ACFs and PACFs will not match exactly their theoretical counterparts. What we are looking 
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for is the resemblance between theoretical and sample ACFs and PACFs so that they can point us in the right 
direction in constructing ARIMA models. And that is why ARIMA modeling requires a great deal of skill, 
which of course comes from practice. 


ARIMA Identification of U.S. GDP 
Returning to the correlogram and partial correlogram of the stationary (after first-differencing) U.S. LGDP 
for 1947-1 to 2007-IV given in Figure 22.3, what do we see? 

Remembering that the ACF and PACF are sample quantities, we do not have a nice pattern as suggested 
in Table 22.1. The autocorrelations (panel a) decline for the first two lags and then, with the exception of 
lag 5, the rest of them are not statistically different from zero (the gray area shown in the figures gives the 
approximate 95 percent confidence limits). The partial autocorrelations (panel b) with spikes at lags 1 and 12 
seem statistically significant but the rest are not; if the partial correlation coefficient were significant only at 
lag 1, we could have identified this as an AR(1) model. Let us therefore assume that the process that generated 
the (first-differenced) LGDP series is an MA(2) process. Keep in mind that unless the ACF and PACF are 
not well-defined, it is hard to choose a model without trial and error. The reader is encouraged to try other 
ARIMA models on the first-differenced LGDP series. 


22.5 Estimation of the ARIMA Model 


Let Y* denote the first differences of U.S. logged GDP. Then our tentatively identified MA model is 
I et ioe, P23 (22.5.1) 


Using EViews, we obtained the following estimates: 


A 


¥* = 0.00823 + 0.2923u;-ı + 0.2046ur-2 
se = (0.00088) (0.0632) (0.0632) 
t= (9.33) (4.62) (3.23) 
R? =0.1217 d=1.9719 


We leave it as an exercise for the reader to estimate other ARIMA models for the first-differenced LGDP 
series. 


(22.5.2) 


22.6 Diagnostic Checking 


How do we know that the model in Eq. (22.5.2) is a reasonable fit to the data? One simple diagnostic is to 
obtain residuals from Eq. (22.5.2) and obtain the ACF and PACF of these residuals, say, up to lag 25. The 
estimated AC and PACF are shown in Figure 22.5. As this figure shows, none of the autocorrelations (panel 
a) and partial autocorrelations (panel b) are individually statistically significant. Nor is the sum of the 25 
squared autocorrelations, as shown by the Box-Pierce Q and Ljung-Box (LB) statistics (see Chapter 21), 
statistically significant. In other words, the correlograms of both autocorrelation and partial autocorrelation 
give the impression that the residuals estimated from Eq. (22.5.2) are purely random. Hence, there may not 
be any need to look for another ARIMA model. 
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22.7 Forecasting 


Remember that the GDP data are for the period 1 974-1 to 2007-IV: Suppose, on the basis of model (22.5.2), 
we want to forecast LGDP for the first four quarters of 2008. But in Eq. (22.5.2) the dependent variable is 
change in the LGDP over the previous quarter. Therefore, if we use Eq. (22.5.2), what we can obtain are 
the forecasts of LGDP changes between the first quarter of 2008 and the fourth quarter of 2007, the second 
quarter of 2008 over the first quarter of 2008, etc. 
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Figure 22.5 (a) Correlogram and (J) partial correlogram for residuals of MA(2) model for the first differences of LGDP. 
United States, 1947-I to 2007-IV. 
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To obtain the forecast of LGDP level rather than its changes, we can “undo” the first- difference transfor- 
mation that we had used to obtain the changes. (More technically, we integrate the first-differenced series.) 
Thus, to obtain the forecast value of LGDP (not ALGDP) for 2008-I, we rewrite model (22.5.1) as 


Y5008—-1 — Y2007-1v = u + Bit2007-1v + 242007-1 + U2008-1 l (22.7.1) 
That is, 


Yoo08—1 = HU + Bi 42007-1v + Bott2007-1 + U2008—1 + Y2007-1v (22.7.2) 
The values of u, B,, and B, are already known from the estimated regression (22.5.2). The value of uagog_) 


is assumed to be zero (why?). Therefore, we can easily obtain the forecast value of Yo,_, The numerical 
estimate of this forecast value is:” 


F2008-1 = 0.00823 + (0.2923)u2007~1v + (0.2046)(u2007—-m1) + ¥2007-1v 
= 0.00823 + (0.2923) (—0.00854) + (0.2024)(0.00399) + 9.3653 
= 9.37185(approx.) 


Thus the forecast value of LGDP for 2008 -I is about 9.3719, which is about $11,753 billion (2000 dollars). 
Incidentally, the actual value of real GDP for 2008-I was $11,693.09 billion; the forecast error was an overes- 
timate of $60 billion. 


22.8 Further Aspects of the BJ Methodology 


In the preceding paragraphs we have provided but a sketchy introduction to the BJ modeling. There are many 
aspects of this methodology that we have not considered for lack of space, for example, seasonality. Many 
time series exhibit seasonal behavior. Examples are sales by department stores in conjunction with major 
holidays, seasonal consumption of ice cream, travels during public holidays, etc. If, for example, we had 
data on department stores sales by quarters, the sales figures would show spikes in the fourth quarter. In such 
situations, one can remove the seasonal influence by taking fourth-quarter differences of the sales figures and 
then decide what kind of ARIMA model to fit. 

We have analyzed only a single time series at a time. But nothing prevents the BJ methodology from being 
extended to the simultaneous study of two or more time series. A foray into that topic would take us far afield. 
The interested reader may want to consult the references.’ In the following section, however, we discuss this 
topic in the context of what is known as vector autoregression. 


22.9 Vector Autoregression (VAR) 


In Chapters 18 to 20 we considered simultaneous, or structural, equation models. In such models some 
variables are treated as endogenous and some as exogenous or predetermined (exogenous plus lagged endog- 
enous). Before we estimate such models, we have to make sure that the equations in the system are identified 
(either exactly or over-). This identification is often achieved by assuming that some of the predetermined 
variables are present only in some equations. This decision is often subjective and has been severely criticized 
by Christopher Sims.’ 


Although standard computer packages do this computation routinely, we show the detailed calculations to illustrate the 
mechanics involved. 

10For an accessible treatment of this subject, see Terence C. Mills, op. cit., Part Ill. 

TIC. A. Sims, “Macroeconomics and Reality,” Econometrica, vol. 48, 1980, pp. 1-48. 
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According to Sims, if there is true simultaneity among a set of variables, they should all be treated on an 
equal footing; there should not be any a priori distinction between endogenous and exogenous variables. It is 
in this spirit that Sims developed his VAR model. 

The seeds of this model were already sown in the Granger causality test discussed in Chapter 17. In Eqs. 
(17.14.1) and (17.14.2), which explain current LGDP in terms of lagged money supply and lagged LGDP and 
current money supply in terms of lagged money supply and lagged LGDP, respectively, we are essentially 
treating LGDP and money supply as a pair of endogenous variables. There are no exogenous variables in this 
system. 

Similarly, in Example 17.13 we examined the nature of causality between money and interest rate in 
Canada. In the money equation, only the lagged values of money and interest rate appear, and in the interest 
rate equation only the lagged values of interest rate and money appear. 

Both these examples are illustrations of vector autoregressive models; the term autoregressive is due to 
the appearance of the lagged value of the dependent variable on the right-hand side and the term vector is due 
to the fact that we are dealing with a vector of two (or more) variables. 


Estimation or VAR 


Returning to the Canadian money—interest rate example, we saw that when we introduced six lags of each 
variable as regressors, we could not reject the hypothesis that there was bilateral causality between money 
(M,) and interest rate, R (90-day corporate interest rate). That is, M, affects R and R affects M,. These kinds 
of situations are ideally suited for the application of VAR. 

To explain how a VAR is estimated, we will continue with the preceding example. For now we assume 
that each equation contains k lag values of M (as measured by M,) and R. In this case, one can estimate each 
of the following equations by OLS.'” 


k k 
My =a+ DB M-j+ oy; Rejptun ` (22.9.1) 
= 7—1 
k k 
Rp =a’ +) 6 Mj +Y Rij tux (22.9.2) 
=i j=l 
where the u’s are the stochastic error terms, called impulses or innovations or shocks in the language of 
VAR. x 


Before we estimate Eqs. (22.9.1) and (22.9.2) we have to decide on the maximum lag length, k. This is an 
empirical question. We have 40 observations in all. Including too many lagged terms will consume degrees 
of freedom, not to mention introducing the possibility of multicollinearity. Including too few lags will lead 
to specification errors. One way of deciding this question is to use a criterion like the Akaike or Schwarz and 
choose that model that gives the lowest values of these criteria. There is no question that some trial and error 
is inevitable. 

To illustrate the mechanics, we initially used four lags (k = 4) of each variable and using EViews 6 we 
obtained the estimates of the parameters of the preceding two equations, which are given in Table 22.2. Note 
that although our sample runs from 1979-I to 1988-IV, we used the sample for the period 1980-I to 1987-IV 
and saved the last four observations to check the forecasting accuracy of the fitted VAR. 


'2One can use the SURE (seemingly unrelated regression) technique to estimate the two equations together. However, 
since each regression contains the same number of lagged endogenous variables, the OLS estimation of each equation 
separately produces identical (and efficient) estimates. 


Table 22.2 Vector Autoregression Estimates Based on 4 Lags 


Sample (adjusted): 1980-I to 1987-IV 
Included observations: 32 after adjustments 
Standard errors in ( ) and t statistics in [ ] 
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M, R 
M, (-1) 1.076738 (0.201737) [5.337337] 0.001282 (0.000674) [1.900824] 
M; (-2) 0.173432 (0.314438) [0.551561] -0.002140 (0.001051)[-2.035830] 
M, (-3) —0.366464 (0.346874) [-1.056474] 0.002176 (0.001160) [1.876975] 
M, (-4) 0.077602 (0.207888) [0.373286] -0.001479 (0.000695) [-2.128543] 
R(-1) -27 5.0290 (57.21736) [-4.806740] 1.139310 (0.191265) [5.956702] 
R (-2) 227.1744 (95.39484) [2.381412] -0.309056 (0.318884) [-0.969179] 
R (-3) 8.511942 (96.91769) [0.087827] 0.052365 (0.323975) [0.161633] 
R (-4) -50.19906 (64.75540) [-0.775210] 0.001073 (0.216463) [0.004958] 
C 2413.824 (1622.646) [1.487585] 4.919031 (5.424156) [0.906875] 
R? 0.988154 0.852889 
Adj. R? 0.984034 0.801721 
Sum square residuals 4820241 53.86238 
SE equation 457.7944 1.530308 
F statistic 239.8315 16.66813 
Log likelihood -236.1676 -53.73718 
Akaike A/C 15.32298 3.921073 
Schwarz SC 1573521 4.333312 
Mean dependent 28514.53 11.67291 
SD dependent 3623.058 3.436688 


Since the preceding equations are OLS regressions, the output of the regression given in Table 22.2 is to 


be interpreted in the usual fashion. Of course, with several lags of the same variables, each estimated coeffi- 
cient will not be statistically significant, possibly because of multicollinearity. But collectively, they may be 
significant on the basis of the standard F test. 

Let us examine the results presented in Table 22.2. First consider the M, regression. Individually, only 
M, at lag | and R at lags | and 2 are statistically significant. But the F value is so high that we cannot reject 
the hypothesis that collectively all the lagged terms are statistically significant. Turning to the interest rate 
regression, we see that all of the four lagged money terms are individually statistically significant (at the 10 
percent or better level), whereas only the 1-period lagged interest rate variable is significant. 

For comparative purposes, we present in Table 22.3 the VAR results based on only 2 lags of each endog- 
enous variable. Here you will see that in the money regression the |-period lagged money variable and both 
lagged interest rate terms are individually statistically significant. In the interest rate regression, both lagged 
money terms (at about the 5 percent level) and one lagged interest term are individually significant. 
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If we have to make a choice between the model given in Table 22.2 and that given in Table 22.3, which 
would we choose? The Akaike and Schwarz information values for the model in Table 22.2 are, respectively, 
15.32 and 15.74, whereas the corresponding values for Table 22.3 are 15.10 and 15.33. Since the lower the 
values ofAkaike and Schwarz statistics, the better the model, on that basis it seems the more parsimonious 
model given in Table 22.3 is preferable. We also considered 6 lags of each of the endogenous variables and 
found that the values of Akaike and Schwarz statistics were 15.38 and 15.99, respectively. Again, the choice 
seems to be the model with two lagged terms of each endogenous variable, that is, the model in Table 22.3. 


Forecasting with VAR 


Suppose we choose the model given in Table 22.3. We can use it for the purpose of forecasting the values of 
M, and R. Remember that our data covers the period 1979-1 to 1988—IV, but we have not used the values for 
1988 in estimating the VAR models. Now suppose we want to forecast the value of M, for 1988~I, that is, the 
first quarter of 1988. The forecast value for 1988-—I can be obtained as follows: 


Moser = 1451.976 + 1.0375 Mj987-1v — 0.0447Mjo37-m 
— 234.8848 Ri987-1v + 160.1559 Ry987-m 


where the coefficient values are obtained from Table 22.3. Now using the appropriate values of M, and R from 
Table 17.5, the forecast value of money for the first quarter of 1988 can be seen to be 36,996 (millions of 
Canadian dollars). The actual value of M, for 1988-I was 36,480, which means that our model overpredicted 
the actual value by about 516 (millions of dollars), which is about 1.4 percent of the actual M, for 1988-1. Of 
course, these estimates will change, depending on how many lagged values we consider in the VAR model. It 
is left as an exercise for the reader to forecast the value of R for the first quarter of 1988 and compare it with 
its actual value for that quarter. j l 


VAR and Causality 


You may recall that we discussed the topic of causality in Chapter 17. There we considered the Granger and 
Sims tests of causality. Is there any connection between VAR and causality? In Chapter 17 (Section 17.14) 
we saw that up to 2, 4, and 6 lags there was bilateral causality between M, and R, but at lag 8 there was no 
causality between the two variables. Thus, the results are mixed. Now you may recall from Chapter 21 the 
Granger representation theorem. One of the implications of this theorem is that if two variables, say, X, and 
Y, are cointegrated and each is individually /(1), that is, integrated of order | (1.e., each is individually nonsta- 
tionary), then either X, must Granger-cause Y, or Y, must Granger-cause X,. 

In our illustrative example this means if M, and R are individually /(1), but are cointegrated, then either 
M, must Granger-cause R or R must Granger-cause M,. This means we must first find out if the two variables 
are (1) individually and then find out if they are cointegrated. If this is not the case, then the whole question 
of causality may become moot. In Exercise 22.22, the reader is asked to find out if the two variables are 
nonstationary but are cointegrated. If you do the exercise, you will find that there is some weak evidence of 
cointegration between M, and R, which is why the causality tests discussed in Section 17.14 were equivocal. 


Some Problems with VAR Modeling 


The advocates of VAR emphasize these virtues of the method: (1) The method is simple: one does not have 
to worry about determining which variables are endogenous and which ones are exogenous. All variables 
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Table 22.3 Vector Autoregression Estimates Based on 2 Legs 


Sample (adjusted): 1979-IIl to 1987-IV 
Included observations: 34 after adjustments 


Standard errors in ( ) and t statistics in [ ] 
SEE ee eee eee 


M, R 
M, (-1) 1.037538 (0.160483) [6.465094] 0.001091 (0.000587) [1.858252] 
M ,(-2) -0.044662 (0.155908) [-0.286465] -0.001255 (0.000571) [-2.198708] 
R(-1) -234.8848 (45.52235) [-5.159770] 1.069082 (0.166599) [6.417090] 
R (-2) 160.1559 (48.52833) [3.300256] -0.223365 (0.177600) [-1.257685] 
G 1451.976 (1185.593) [1.224684] 5.796446 (4.338940) [1.335913] 
R? 0.988198 0.806661 
Adj. R? 0.986571 0.779993 
Sum square residuals 5373508 71.97045 
SE equation 430.4572 1.575354 
F statistic 607.0723 30.24882 
Log likelihood -251.7446 -60.99213 
Akaike A/C 15.10263 3.881890 
Schwarz SC 15.32709 4.106355 
Mean dependent 28216.26 . 11.75049 
SD dependent 3714.507 3.358613 


in VAR are endogenous.’ (2) Estimation is simple; that is, the usual OLS method can be applied to each 
equation separately. (3) The forecasts obtained by this method are in many cases better than those obtained 
from the more complex simultaneous-equation models. !4 

But the critics of VAR modeling point out the following problems: 


1. Unlike simultaneous-equation models, a VAR model is a-thevretic because it uses less prior information. 
Recall that in simultaneous-equation models exclusion or inclusion of certain variables plays a crucial role in 
the identification of the model. 

2. Because of its emphasis on forecasting, VAR models are less suited for policy analysis. 

3. The biggest practical challenge in VAR modeling is to choose the appropriate lag length. Suppose you 
have a three-variable VAR model and you decide to include eight lags of each variable in each equation. You 
will have 24 lagged parameters in each equation plus the constant term, for a total of 25 parameters. Unless 
the sample size is large, estimating that many parameters will consume a lot of degrees of freedom with all 
the problems associated with that." 


'3Sometimes purely exogenous variables are included to allow for trend and seasonal factors. 


See, for example, T. Kinal and J. B. Ratner, “Regional Forecasting Models with Vector Autoregression: The Case of New 
York State,” Discussion Paper #155, Department of Economics, State University of New York at Albany, 1982. 


'Sif we have an m-equation VAR model with p lagged values of the m variables, in all we have to estimate (m + pm’) 
parameters. 
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4. Strictly speaking, in an m-variable VAR model, all the m variables should be (jointly) stationary. If 
that is not the case, we will have to transform the data appropriately (e.g., by first-differencing). As Harvey 
notes, the results from the transformed data may be unsatisfactory. He further notes that “The usual approach 
adopted by VAR aficionados is therefore to work in levels, even if some of these series are nonstationary. In 
this case, it is important to recognize the effect of unit roots on the distribution of estimators.”!© Worse yet, 
if the model contains a mix of /(0) and J(1) variables, that is, a mix of stationary and nonstationary variables, 
transforming the data will not be easy. 

However, Cuthbertson argues that, “. . . cointegration analysis indicates that a VAR solely in first differ- 
ences is misspecified, if there are some cointegrating vectors present among the /(1) series. Put another way, 
a VAR solely in first differences omits potentially important stationary variables (i.e., the error-correction, 
cointegrating vectors) and hence parameter estimates may suffer from omitted variables bias.” !” 

5. Since the individual coefficients in the estimated VAR models are often difficult to interpret, the practi- 
tioners of this technique often estimate the so-called impulse response function (IRF). The IRF traces out 
the response of the dependent variable in the VAR system to shocks in the error terms, such as u, and u, in 
Eqs. (22.9.1) and (22.9.2). Suppose u; in the M, equation increases by a value of one standard deviation. 
Such a shock or change will change M; in the current as well as future periods. But since M, appears in the 
R regression, the change in u, will also have an impact on R. Similarly, a change of one standard deviation 
in u, of the R equation will have an impact on M,. The IRF traces out the impact of such shocks for several 
periods in the future. Although the utility of such IRF analysis has been questioned by researchers. it is the 
centerpiece of VAR analysis. !® 


For a comparison of the performance of VAR with other forecasting techniques, the reader may consult 
the references.’ 


An Application of VAR: A VAR Model of the Texas Economy 


To test the conventional wisdom, “As the oil patch goes, so goes the Texas economy.” Thomas Fomby and 
Joseph Hirschberg developed a three-variable VAR model of the Texas economy for the period 1974-1 to 
1988-1.” The three variables considered were (1) percentage change in real price of oil, (2) percentage 
change in Texas nonagricultural employment, and (3) percentage change in nonagricultural employment in 
the rest of the United States. The authors introduced the constant term and two lagged values of each variable 
in each equation. Therefore, the number of parameters estimated in each equation was seven. The results 
of the OLS estimation of the VAR model are given in Table 22.4. The F tests given in this table are to test 
the hypothesis that collectively the various lagged coefficients are zero. Thus, the F test for the x variable 
(percentage change in real price of oil) shows that both the lagged terms of x are statistically different from 
zero; the probability of obtaining an F value of 12.5536 under the null hypothesis that they are both simul- 
taneously equal to zero is very low, about 0.00004. On the other hand, collectively. the two lagged v values 
(percentage change in Texas nonagricultural employment) are not significantly different from zero to explain 
x; the F value is only 1.36, All other F statistics are to be interpreted similarly. 


16andrew Harvey, The Econometric Analysis of Time Series, The MIT Press, 2d ed., Cambridge, Mass., 1990, p. 83. 

17Keith Cuthbertson, Quantitative Financial Economics: Stocks, Bonds and Foreign Exchange, John Wiley & Sons, New York, 
2002, p. 436. 

18D, E. Runkle, “Vector Autoregression and Reality,” Journal of Business and Economic Statistics, vol. 5, 1987, pp. 437-454. 


125. McNees, “Forecasting Accuracy of Alternative Techniques: A Comparison of U.S. Macroeconomic Forecasts,” Journal of 


Business and Economic Statistics, vol. 4, 1986, pp. 5-15; and E. Mahmoud, “Accuracy in Forecasting: A Survey,” Journal of 
Forecasting, vol. 3, 1984, pp. 139-159. 


*°Thomas B. Fomby and Joseph G. Hirschberg, “Texas in Transition: Dependence on Oil and the National Economy,” Eco- 
nomic Review, Federal Reserve Bank of Dallas, January 1989, pp. 11-28. 
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Table 22.4 Estimation Results for Second-Order* Texas VAR System: 1974-1 to 1988-1 


Dependent variable: x (percentage change in real price of oil) 


Variable Lag Coefficient Standard error Significance level 

x 1 0.7054 0.1409 0.8305E—5 

x 2 ~—0.3351 0.1500 0.3027E-—1 

y 1 —1.3525 2.7013 0.6189 

y 2 3.4371 2.4344 0.1645 

Z 1 3.4566 2.8048 0.2239 

Z 2 —4.8703 2.7500 0.8304E—1 
Constant 0 —0.9983E—2 0.1696E—1 0.5589 


R? = 0.2982; Q(21) = 8.2618 (P = 0.9939) 
Tests for joint significance, dependent variable = x 


Variable F-statistic Significance level 

x 12.5536 0.4283E—4 

y 1.3646 0.2654 

Zz 1.5693 0.2188 
Dependent variable: y (percentage change in Texas nonagricultural employment) 
Variable Lag Coefficient Standard error Significance level 

x 1 0.2228E-1 0.8759E—2 0.1430E-1 

x 2 —0.1883E—2 0.9322E—2 0.8407 

y 1 0.6462 0.1678 0.3554E—3 

y 2 0.4234E-1 0.1512 0.7807 

z 1 0.2655 0.1742 0.1342 

Zz 2 —0.1715 0.1708 0.3205 
Constant 0 —0.1602E~—2 0.1053E-1 0.1351 


R? = 0.6316; Q(21) = 21.5900 (P = 0.4234) 
Tests for joint significance, dependent variable = y 


Variable F-statistic Significance level 
x 3.6283 0.3424E—4 
y 19.1440 0.8287E—6 
Zz 1.1684 Oot 


Dependent variable: z (percentage change in nonagricultural employment in rest of 
United States) 


Variable Lag Coefficient Standard error Significance level 

x 1 —0.8330E—2 0.6849E—2 0.2299 

x 2 013635E—2 0.7289E—2 0.6202 

y 1 0.3849 0.1312 0.5170E—2 

y 2 —0.4805 0.1182 0.1828E—2 

z 1 0.7226 0.1362 0.3004E—5 

z 2 —0.1366E—1 0.1336 0.9190 
Constant 0 —0.2387E—2 0.8241E—3 0.5701E—2 


R? = 0.6503; Q(21) = 15.6182 (P = 0.7907) 
Tests for joint significance, dependent variable = z 


Variable F-statistic Significance level 
x 0.7396 0.4827 
y 8.2714 0.8360E—3 


Z 27.9609 0.1000E—7 


*Two-lagged terms of each variable. 


Source: Economic Review, Federal Reserve Bank of Dallas, January 1989, p. 21. 
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On the basis of these and other results presented in their paper, Fomby and Hirschberg conclude that the 
conventional wisdom about the Texas economy is not quite accurate, for after the initial instability resulting 
from OPEC oil shocks, the Texas economy is now less dependent on fluctuations in the price of oil. 


22.10 Measuring Volatility in Financial Time Series: The ARCH and 
GARCH Models 


As noted in the introduction to this chapter, financial time series, such as stock prices, exchange rates, 
inflation rates, etc., often exhibit the phenomenon of volatility clustering, that is, periods in which their 
prices show wide swings for an extended time period followed by periods in which there is relative calm. As 
Philip Franses notes: 
Since such [financial time series] data reflect the result of trading among buyers and sellers at, for example. stock 
markets, various sources of news and other exogenous economic events may have an impact on the time series 
pattern of asset prices. Given that news can lead to various interpretations, and also given that specific economic 
events like an oil crisis can last for some time, we often observe that large positive and large negative observations 
in financial time series tend to appear in clusters.! 


Knowledge of volatility is of crucial importance in many areas. For example. considerable macroecono- 
metric work has been done in studying the variability of inflation over time. For some decision makers, 
inflation in itself may not be bad, but its variability is bad because it makes financial planning difficult. 

The same is true of importers, exporters, and traders in foreign exchange markets, for variability in the 
exchange rates means huge losses or profits. Investors in the stock market are obviously interested in the 
volatility of stock prices, for high volatility could mean huge losses or gains and hence greater uncertainty. In 
volatile markets it is difficult for companies to raise capital in the capital markets. 

How do we model financial time series that may experience such volatility? For example, how do we 
model times series of stock prices, exchange rates, inflation, etc.? A characteristic of most of these financial 
time series is that in their /evel form they are random walks; that is, they are nonstationary. On the other hand, 
in the first difference form, they are generally stationary, as we saw in the case of GDP series in the previous 
chapter, even though GDP is not strictly a financial time series. 

Therefore, instead of modeling the levels of financial time series, why not model their first differences? 
But these first differences often exhibit wide swings, or volatility, suggesting that the variance of financial 
time series varies over time. How can we model such “varying variance”? This is where theso-called autore- 
gressive conditional heteroscedasticity (ARCH) model originally developed by Engle comes in handy.” 

As the name suggests, heteroscedasticity, or unequal variance, may have an autoregressive structure in that 
heteroscedasticity observed over different periods may be autocorrelated. To see what all this means, let us 
consider a concrete example. 


Example 22.1 U.S./U.K. Exchange Rate: An Example 


Figure 22.6 gives logs of the monthly U.S./U.K. exchange rate (dollars per pound) for the period 1971-2007, 
for a total of 444 monthly observations. As you can see from this figure, there are considerable ups and downs 


2'Philip Hans Franses, Time Series Models for Business and Economic Forecasting, Cambridge University Press, New York, 1998, 
pss: 


?2R, Engle, “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation,” 


Econometrica, vol. 50. no. 1, 1982, pp. 987-1007. See also A. Bera and M. Higgins, “ARCH Models: Properties, Estimation 
and Testing,” Journal of Economic Surveys, vol. 7, 1993, pp. 305-366. 
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Figure 22.6 Log of US./U.K. exchange rate, 1971-2007 (monthly) 


in the exchange rate over the sample period. To see this more vividly, in Figure 22.7 we plot the changes 
in the logs of the exchange rate; note that changes in the log of a variable denote relative changes, which, 
if multiplied by 100, give percentage changes. As you can observe, the relative changes in the U.S./U.K. 
exchange rate show periods of wide swings for some time periods and periods of rather moderate swings in 
other time periods, thus exemplifying the phenomenon of volatility clustering. 

Now the practical question is: How do we statistically measure volatility? Let us illustrate this with our 
exchange rate example. 
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Figure 22.7 Change in the log of U.S./U.K. exchange rate. 
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Let Y,=U.S./U.K. exchange rate 
ye = log of Ye 
dYč = Yč — Yčı = relative change in the exchange rate 
dY * = mean of dY 
Xt = dy,* = dy * 

Thus, X,is the mean-adjusted relative change in the exchange rate. Now we can use X? as a measure of 
volatility. Being a squared quantity, its value will be high in periods when there are big changes in the prices 
of financial assets and its value will be comparatively small when there are modest changes in the prices of 
financial assets.” 

Accepting X? as a measure of volatility, how do we know if it changes over time? Suppose we consider the 
following AR(1), or ARIMA (1, 0, 0), model: 


X? = Bo + pı XZ + ut (22.10.1) 


This model postulates that volatility in the current period is related to its value in the previous period plus 
a white noise error term. If 8, is positive, it suggests that if volatility was high in the previous period, it will 
continue to be high in the current period, indicating volatility clustering. If 8, is zero, then there is no volatility 
clustering. The statistical significance of the estimated B, can be judged by the usual t test. 

There is nothing to prevent us from considering an AR(p) model of volatility such that 


X? = Bo + Bı X2; +B2XŽ3 +- +BpXĒp+ ur (22.10.2) 


This model suggests that volatility in the current period is related to volatility in the past p periods, the value of 
p being an empirical question. This empirical question can be resolved by one or more of the model selection 
criteria that we discussed in Chapter 13 (e.g., the Akaike information measure). We can test the significance 
of any individual £ coefficient by the t test and the collective significance of two or more coefficients by the 
usual F test. 

Model (22.10.1) is an example of an ARCH(1) model and Eq. (22.10.2) is called an ARCH(p) model, 
where p represents the number of autoregressive terms in the model. 

Before proceeding further, let us illustrate the ARCH model with the U.S./U.K. exchange rate data. The 
results of the ARCH(1) model were as follows. 


X = 0.00043 + 0.23036X7, 
t = (7.71) (4.97) i 


R? = 0.0531 d= 1.9933 
where X? is as defined before. F 
Since the coefficient of the lagged term is highly significant (p value of about 0.000), it seems volatility 
clustering is present in the present instance. We tried higher-order ARCH models, but only the AR(1) model 
turned out to be significant. 
How would we test for the ARCH effect in a regression model in general that is based on time series data? 
To be more specific, let us consider the k-variable linear regression model: 


Ye = Bi + B2 Xat +--+ + bkXkt + ur (22.10.4) 
and assume that conditional on the information available at time (t — 1), the disturbance term is distributed as 
Ur ~ NJO, (a + on u? ,)] | (22.10.5) 


that is, u, is normally distributed with zero mean and 


You might wonder why we do not use the variance of X, = > X2/n as a measure of volatility. This is because we want 
to take into account changing volatility of asset prices over time. If we use the variance of Xy it will only be a single value 
for a given data set. 
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var (uy) = (co + ay uga) (22.10.6) 

that is, the variance of u, follows an ARCH(1) process. 
The normality of u, is not new to us. What is new is that the variance of u at time t is dependent on the 
squared disturbance at time (t - 1), thus giving the appearance of serial correlation.24 Of course, the error 


variance may depend not only on one lagged term of the squared error term but also on several lagged 
squared terms as follows: 


var (ur) = a? = œo + a} ua S a2u? , ap ogen App (22.10.7) 
If there is no autocorrelation in the error variance, we have 
Ho: a) = a2 =---=ap=0 (22.10.8) 


in which case var(u,) = ag, and we do not have the ARCH effect. 
Since we do not directly observe af, Engle has shown that running the following regression can easily test 
the preceding null hypothesis: 


Of = Go + G10? +6207 +--+ hpi? _, (22.10.9) 
where û;, as usual, denotes the OLS residuals obtained from the original regression model (22.10.4). 


One can test the null hypothesis Hy by the usual F test, or alternatively, by computing nR?, where R? is the 
coefficient of determination from the auxiliary regression (22.10.9). It can be shown that 


Ghee x, (22.10.10) 
that is, in large samples nR? follows the chi-square distribution with df equal to the number of autoregressive 
terms in the auxiliary regression. 

Before we proceed to illustrate, make sure that you do not confuse autocorrelation of the error term as 
discussed in Chapter 12 and the ARCH model. In the ARCH model it is the (conditional) variance of u, that 
depends on the (squared) previous error terms, thus giving the impression of autocorrelation. 


Example 22.2 New York Stock Exchange Price Changes 


As a further illustration of the ARCH effect, Figure 22.8 presents monthly percentage change in the NYSE (New 
York Stock Exchange) Index for the period 1966-2002." It is evident from this graph that the percent price 
changes in the NYSE Index exhibit considerable volatility. Notice especially the wide swing around the 1987 
crash in stock prices. 

To capture the volatility in the stock return seen in the figure, let us consider a very simple model: 


Y: = Bi + Ut (22.10.1 1) 


where Y,= percent change in the NYSE stock index and u,= random error term. 
Notice that besides the intercept, there is no other explanatory variable in the model. From the data, we 
obtained the following OLS regression: 


f, = 0.00574 
t = (3.36) (22.10.12) 
d= 1.4915 


24A technical note: Remember that for our classical linear model the variance of u, was assumed to be a”, which in the pres- 
ent context becomes unconditional variance. If a, < 1, the stability condition, we can write o? = Qo + a07; that is, o? = Qo/ 
(1 — a). This shows that the unconditional variance of u does not depend on t, but does depend on the ARCH parameter a). 
25This graph and the regression results presented in this example are based on the data collected by Gary Koop, Analysis of 
Economic Data, John Wiley & Sons, New York, 2000 (data from the data disk). The monthly percentage change in the stock 
price index can be regarded as a rate of return on the index. 
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Figure 22.8 Monthly percent change in the NYSE Price Index, 1966-2002. 


What does this intercept denote? It is simply the average percent rate of return on the NYSE index, or the 
mean value of Y,(can you verify this?). Thus over the sample period the average monthly return on the NYSE 
index was about 0.00574 percent. 

Now we obtain the residuals from the preceding regression and estimate the ARCH(1) model, which gave 
the following results: 


D 


= 0.000007 + 0.254060? , 
f = (0.000) (5.52) ` (22.10.13) 
R? = 0.0645 d = 1.9464 
where & is the estimated residual from regression (22.10.12). 
Since the lagged squared disturbance term is statistically significant (p value of about 0.000), it seems the 


error variances are correlated; that is, there is an ARCH effect. We tried higher-order ARCH models but only 
ARCH(1) was statistically significant. 


What to Do If ARCH Is Present 


Recall that we have discussed several methods of correcting for heteroscedasticity, which basically involved 
applying OLS to transformed data. Remember that OLS applied to transformed data is generalized least 
squares (GLS). If the ARCH effect is found, we will have to use GLS. We will not pursue the technical details, 
for they are beyond the scope of this book.”° Fortunately, software packages such as EViews, SHAZAM, 
MICROFIT, and PC-GIVE now have user-friendly routines to estimate such models. 


wv 


A Word on the Durbin—Watson d and the ARCH Effect 


We have reminded the reader several times that a significant d statistic may not always mean that there is 
significant autocorrelation in the data at hand. Very often a significant d value is an indication of the model 


26Consult Russell Davidson and James G. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New 
York, 1993, Section 16.4 and William H. Greene, Econometric Analysis, 4th ed., Prentice Hall, Englewood Cliffs, Nj, 2000, 
Section 18.5. 
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specification errors that we discussed in Chapter 13. Now we have an additional specification error, due to 
the ARCH effect. Therefore, in a time series regression, if a significant d value is obtained, we should test 
for the ARCH effect before accepting the d statistic at its face value. An example is given in Exercise 22.23. 


A Note on the GARCH Model 


Since its “discovery” in 1982, ARCH modeling has become a growth industry, with all kinds of varia- 
tions on the original model. One that has become popular is the generalized autoregressive conditional 
heteroscedasticity (GARCH) model, originally proposed by Bollerslev.”’ The simplest GARCH model is 
the GARCH(1, 1) model, which can be written as: 


a? = Qo + wu? + ano, (22.10.14) 


which says that the conditional variance of u at time t depends not only on the squared error term in the 
previous time period (as in ARCH[1]) but also on its conditional variance in the previous time period. This 
model can be generalized to a GARCH(p, q) model in which there are p lagged terms of the squared error 
term and q terms of the lagged conditional variances. 

We will not pursue the technical details of these models, as they are involved, except to point out that 
a GARCH(1, 1) model is equivalent to an ARCH(2) model and a GARCH(p, q) model is equivalent to an 
ARCH(p + q) model.” 

For our U.S./U.K. exchange rate and NYSE stock return examples, we have already stated that an ARCH(2) 
model was not significant, suggesting that perhaps a GARCH(1, 1) model is not appropriate in these cases. 


22.11 Concluding Examples 


We conclude this chapter by considering a few additional examples that illustrate some of the points we have 
made in this chapter. 


Example 22.3 The Relationship between the Help-Wanted Index (HWT) and the Unemployment 
Rate (UN) from January 1969 to January 2000 


To study causality between HWI and UN, two indicators of labor market conditions in the United States, Marc 
A. Giammatteo considered the following regression model:?? 


25 25 

HW; = œo + X ajUNr; + >> Bj/HWhe_; (22.11.1) 
i j 
25 25 

UN: = go + X AUN; + È 6; HWhe_; (22.11.2) 


i=1 j=] 

To save space we will not present the actual regression results, but the main conclusion that emerges from 
this study is that there is bilateral causality between the two labor market indicators and this conclusion did 
not change when the lag length was varied. The data on HWI and UN are given on the textbook website as 
Table 22.5. 


27T, Bollerslev, “Generalized Autoregressive Conditional Heteroscedasticity,” Journal of Econometrics, vol. 31, 1986, 
pp. 307-326. 

28For details, see Davidson and MacKinnon, op. cit., pp. 558-560. 

29Marc A. Giammatteo (West Point, Class of 2000), “The Relationship between the Help Wanted Index and the Unemploy- 
ment Rate,” unpublished term paper. (Notations altered to conform to our notation.) 
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Example 22.4 ARIMA Modeling of the Yen/Dollar Exchange Rate: January 1971 to April 2008 


The yen/dollar exchange rate (¥/$) is a key exchange rate. From the logarithms of the monthly ¥/$, it was 
found that in the level form this exchange rate showed the typical pattern of a nonstationary time series. But 
examining the first differences, it was found that they were stationary; the graph here pretty much resembles 
Figure 22.8. 

Unit root analysis confirmed that the first differences of the logs of ¥/$ were stationary. After examining the 
correlogram of the log first differences, we estimated the following MA(1) model: 


Y= —0.0028 — 0.3300u;_1 
tater =7 22) (22.11.3) 
=0.1012 d= 1.9808 


where Y, = first differences of the logs of ¥/$ and u = a white noise error term. 

To save space, we have provided the data underlying the preceding analysis on the textbook website in 
Table 22.6. Using these data, the reader is urged to try other models and compare their forecasting perfor- 
mances. 


Example 22.5 ARCH Model of the U.S. Inflation Rate: January 1947 to March 2008 


To see if the ARCH effect is present in the U.S. inflation rate as measured by the CPI, we obtained CPI data 
from January 1947 to March 2008. The plot of the logarithms of the CPI showed that the time series was 
nonstationary. But the plot of the first differences of the logs of the CPI, as shown in Figure 22.9, shows 
considerable volatility even though the first differences are stationary. 

Following the procedure outlined in regressions (22.10.12) and (22.10.13), we first regressed the logged 
first differences of CPI on a constant and obtained residuals from this equation: Squaring these residuals, we 
obtained the following ARCH(2) model: 


û? = 0.000028 + 0.121250? , + 0.087180? , 
t = (5.42) (3.34) (2.41) (22.11.4) 
=0.026 d=2.0214 
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Figure 22.9 First differences of the logs of CPI. 
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As you can see, there is quite a bit of persistence in the volatility, as volatility in the current month depends on 
volatility in the preceding 2 months. The reader is advised to obtain CPI data from government sources and 
try to see if another model, preferably a GARCH model, does a better job. 


Summary and Conclusions 


l. Box—Jenkins and VAR approaches to economic forecasting are alternatives to traditional single- and 
simultaneous-equation models. 
2. To forecast the values of a time series, the basic Box—Jenkins strategy is as follows: 


a. 


First examine the series for stationarity. This step can be done by computing the autocorrelation 
function (ACF) and the partial autocorrelation function (PACF) or by a formal unit root analysis. 
The correlograms associated with ACF and PACF are often good visual diagnostic tools. 
If the time series is not stationary, difference it one or more times to achieve stationarity. 


°’ The ACF and PACF of the stationary time series are then computed to find out if the series is purely 


autoregressive or purely of the moving average type or a mixture of the two. From broad guidelines 
given in Table 22.1 one can then determine the values of p and q in the ARMA process to be fitted. 
At this stage the chosen ARMA(p, q) model is tentative. 


d. The tentative model is then estimated. 


f 


The residuals from this tentative model are examined to find out if they are white noise. If they are, 
the tentative model is probably a good approximation to the underlying stochastic process. If they 
are not, the process is started all over again. Therefore, the Box-Jenkins method is iterative. 

The model finally selected can be used for forecasting. 


3. The VAR approach to forecasting considers several time series at a time. The distinguishing features of 
VAR are as follows: 


a. 
b. 


It is a truly simultaneous system in that all variables are regarded as endogenous. 

In VAR modeling the value of a variable is expressed as a linear function of the past, or lagged, 
values of that variable and all other variables included in the model. 

If each equation contains the same number of lagged variables in the system, it can be estimated by 
OLS without resorting to any systems method, such as two-stage least squares (2SLS) or seemingly 
unrelated regressions (SURE). 

This simplicity of VAR modeling may be its drawback. In view of the limited number of observa- 
tions that are generally available in most economic analyses, introduction of several lags of each 
variable can consume a lot of degrees of freedom.*° 

If there are several lags in cach equation, it is not always easy to interpret each coefficient, especially 
if the signs of the coefficients alternate. For this reason one examines the impulse response function 
(IRF) in VAR modeling to find out how the dependent variable responds to a shock administered to 
one or more equations in the system. 

There is considerable debate and controversy about the superiority of the various forecasting 
methods. Single-equation, simultaneous-equation, Box—Jenkins, and VAR methods of forecasting 
have their admirers as well as their detractors. All one can say is that there is no single method 
that will suit all situations. If that were the case, there would be no need for discussing the various 
alternatives. One thing is sure: The Box—Jenkins and VAR methodologies have now become an 
integral part of econometrics. 


30Followers of Bayesian statistics believe that this problem can be minimized. See R. Litterman, “A Statistical Approach to 
Economic Forecasting,” Journal of Business and Economic Statistics, vol. 4, 1986, pp. 1-4. 
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We also considered in this chapter a special class of models, ARCH and GARCH, which are especially 
useful in analyzing financial time series, such as stock prices, inflation rates, and exchange rates. A 
distinguishing feature of these models is that the error variance may be correlated over time because 
of the phenomenon of volatility clustering. In this connection we also pointed out that in many cases a 
significant Durbin—Watson d may in fact be due to the ARCH or GARCH effect. 

There are variants of ARCH and GARCH models, but we have not considered them in this chapter 
due to space constraints. Some of these other models are: GARCH-M (GARCH in mean), TGARCH 
(threshold GARCH), and EGARCH (exponential GARCH). A discussion of these models can be 
found in the references.*! 


Multiple Choice Questions 


. A approach to forecasting using time-series data is 


a. Single equation regression model 
b. Simultaneous equation regression model 
c. Vector autoregression 
d. All of the above 
The method of fitting a suitable curve to historical data of a given time series is known as 
a. Exponential smoothing method 
b. Single-equation regression model 
c. ARIMA model 
d. VAR model 
The method used to analyze the stochastic properties of economic time series on their own is known as 
a. Exponential smoothing method 
b. Single-equation regression model 
c. ARIMA model 
d. VAR model 
The methodology used to fit simultaneous equation model where all variables are considered endog- 
enous with no exogenous variable is 
a. Exponential smoothing method 
b. Single-equation regression model 
c. ARIMA model 
d. VAR model 
Example of a model usually not derived from any economic theory is 
a. Single equation regression model 
b. Simultaneous equation regression model 
c. ARIMA 
d. All the above 
Applying different forecasting approaches to time series data requires that time-series process be 
a. Normally distributed 
b. Stationary 


31See Walter Enders, Applied Econometric Time Series, 2d ed., John Wiley & Sons, New York, 2004. For an application-orient- 
ed discussion, see Dimitrios Asteriou and Stephen Hall, Applied Econometrics: A Modern Approach, revised edition, Palgrave/ 
Macmillan, New York, 2007, Chapter 14. 


10. 
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12. 


13. 


14. 


15. 
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c. Non-stationary 

d. Sample size of at least 30 observations 
The model where the value of Y depends only on its value in the previous time period and a random 
term is 

a. Single equation model 

b. AR(1) model 

c. MA(1) model 

d. ARMA (1, 1) model 


. The model in which Y depends on current and previous time period error term, is 


a. Single equation model 
b. AR(1) model 

c. MA(1) model 

d. ARMA (1, 1) model 


. If a time series is integrated of order d, to make this series stationary we need to 


a. Difference it ‘d’ times 
b. Difference it ‘d — 1’ times 
c. Difference it ‘d+ 1’ times 
d. Detrend it ‘d’ times 
A weakly stationary stochastic process is one where the mean and variance are constant and 
a. Mean is equal to zero 
b. They follow normal distribution 
c. Its covariance is time-invariant 
d. Its correlation is time-invariant 
ARIMA (1, 2, 3) means 
a. The series has to be first differenced to make it stationary 
b. The series has to be differenced twice to make it stationary 
c. The series has to be differenced thrice to make it stationary 
d. Cannot say about the stationary condition from the given information 
ARIMA (1, 2, 3) means 
a. First differenced stationary time series can be modeled as an ARMA (2, 3) 
b. Two times differenced stationary time series can be modeled as an ARMA (1, 3) 
c. Three times differenced stationary time series can be modeled as an ARMA (1, 2) 
d. The series can be modeled as ARMA (3, 3) 
ARIMA (p, 0, 0) means the stochastic process is a 
a. AR (p) stationary process 
b. MA (p) stationary process 
c. Time series that needs to be differenced p times to make it stationary 
d. Nonstationary series with p lags 
ARIMA (0, 0, q) means the stochastic process is a 
a. AR (q) stationary process 
b. MA (q) stationary process 
c. Time series that needs to be differenced q times to make it stationary 
d. Non-stationary series with q lags 
Using Box—Jenkins method, a model is chosen if the residuals estimated from the model are 
a. Stationary 
b. Weakly stationary 
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c. White noise 
d. Non-stationary 
Correlogram is 
a. Test statistics used to test the chosen ARIMA model for goodness of fit 
b. Plots of autocorrelation function and partial autocorrelation function against lag length 
c. Plots of autocorrelation function and partial autocorrelation function against time 
d. Plots of error term against time 
Under Box—Jenkins method, which of the following tools is used for identification of a model? 
a. Autocorrelation function 
b. The partial autocorrelation function 
c. Correlograms 


' d. All of the above 


18. 


et, 


20. 


Partial autocorrelation measures correlation (controlling for correlation at intermediate lags) between 
observations that are 

a. k time periods apart 

b. k+ 1 time periods apart 

c. 2k time periods apart 

d. 2k + 1 time periods apart 
The response of the dependent variable in the VAR system to shocks in the error terms is traced by 

a. Volatility clustering 

b. Impulse response function 

c. Volatility 

d. Partial autocorrelation function 
Data such as stock prices exhibit periods in which their prices show wide swings for an extended time 
period followed by periods in which there is relative calm. This phenomenon is known as 

a. Volatility clustering 

b. Impulse response function 

c. Volatility 

d. Partial autocorrelation function 


Exercises $ 


Questions 


Zoe 
ae 


22 
22.4. 
2257 
22.6. 
PRR: 
22 


What are the major methods of economic forecasting? 

What are the major differences between simultaneous-equation and Box—Jenkins approaches to 
economic forecasting? 

Outline the major steps involved in the application of the Box—Jenkins approach to forecasting. 
What happens if Box—Jenkins techniques are applied to time series that are nonstationary? 

What are the differences between Box—Jenkins and VAR approaches to economic forecasting? 

In what sense is VAR atheoretic? 

“If the primary object is forecasting, VAR will do the job.” Critically evaluate this statement. 

Since the number of lags to be introduced in a VAR model can be a subjective question, how does one 
decide how many lags to introduce in a concrete application? 
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22.9. Comment on this statement: “Box—Jenkins and VAR are prime examples of measurement without 
theory.” 


22.10. What is the connection, if any, between Granger causality tests and VAR modeling? 


Empirical Exercises 


22.11. Consider the data on log DPI (personal disposable income) introduced in Section 21.1 (see the book’s 
website for the actual data). Suppose you want to fit a suitable ARIMA model to these data. Outline 
the steps involved in carrying out this task. 

22.12. Repeat Exercise 22.11 for the LPCE (personal consumption expenditure) data introduced in Section 
21.1 (again, see the book’s website for the actual data). 

22.13. Repeat Exercise 22.11 for the LCP. 

22.14. Repeat Exercise 22.11 for the LDNIDENDS. 

22.15. In Section 13.9 you were introduced to the Schwarz Information Criterion (SIC) to determine lag 
length. How would you use this criterion to determine the appropriate lag length in a VAR model? 

22.16. Using the data on LPCE and LDPI introduced in Section 21.1 (see the book’s web- site for the actual 
data). develop a bivariate VAR model for the period 1 970—I to 2006-IV. Use this model to forecast 
the values of these variables for the four quarters of 2007 and compare the forecast values with the 
actual values given in the dataset. 

22.17. Repeat Exercise 22.16, using the data on LDIVIDENDS and LCP. 

"22.18. Refer to any statistical package and estimate the impulse response function for a period of up to 8 lags 
for the VAR model that you developed in Exercise 22.16. 

22.19. Repeat Exercise 22.18 for the VAR model that you developed in Exercise 22.17. 

22.20. Refer to the VAR regression results given in Table 22.4. From the various F tests reported in the three 
regressions given there, what can you say about the nature of causality in the three variables? 

22.21. Continuing with Exercise 20.20, can you guess why the authors chose to express the three variables 
in the model in percentage change form rather than using the levels of these variables? (Hint: Station- 
arity.) 

22.22. Using the Canadian data given in Table 17.5, find out if M, and R are stationary random variables. If 
not, are they cointegrated? Show the necessary calculations. 

22.23. Continue with the data given in Table 17.5. Now consider the following simple model of money 
demand in Canada: 


In Mj, = By =F Bo In GDP, + Bs In R; + uy; 


a. How would you interpret the parameters of this model? 
b. Obtain the residuals from this model and find out if there is any ARCH effect. 

22.24. Refer to the ARCH(2) model given in Eq. (22.11.4). Using the same data we estimated the following 
ARCH(1) model: 


ñ? = 0.00000078 + 0.373782, 


t = (7.5843) (10.2351) 
R? = 0591297 d = 1.9896 


How would you choose between the two models? Show the necessary calculations. 


“Optional 
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22.25. Table 22.7 gives data on three-month (TB3M) and six-month (TB6M) Treasury bill rates from 
January 1, 1982, to March 2008, for a total of 315 monthly observations. The data can be found on 
the textbook’s website. 

a. Plot the two time series in the same diagram. What do you see? 

b. Do a formal unit root analysis to find out if these time series are stationary. 

c. Are the two time series cointegrated? How do you know? Show the necessary calculations. 

d. What is the economic meaning of cointegration in the present context? If the two series are not 
cointegrated, what are the economic implications? 

e. If you want to estimate a VAR model, say, with four lags of each variable, do you have to use the 
first differences of the two series or can you do the analysis in levels of the two series? Justify your 
answer. 

22.26. Class Exercise: Pick a stock market index of your choosing and obtain daily data on the value of the 
chosen index for five years to find out if the stock index is characterized by ARCH effects. 

22.27. Class Exercise: Collect data on inflation and unemployment rates in the U.S. for the quarterly periods 
in 1980-2007 and develop and estimate a VAR model for the two variables. To compute the inflation 
rate, use CPI (consumer price index) and use the civilian unemployment rate for the unemployment 
rate. Pay careful attention to the stationarity of these variables. Also. find out if one variable Granger- 
causes the other variable. Present all your calculations. 


Key to Multiple Choice Questions 


1. (d) 2. (a) SEC) 4. (d) Site) 6. (b) 7. (b) 8. (c) 9. (a) 
10. (c) 11. (b) 12. (b) 13. (a) 14. (a) 15. (c) 16. (b) 17. (d) 18. (a) 
19. (b) 20. (a) 
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Statistical Tables 


Table D.1 Areas under the Standardized Normal Distribution 

Table D.2 Percentage Points of the t Distribution 

Table D.3 Upper Percentage Points of the F Distribution 

Table D.4 Upper Percentage Points of the y? Distribution 

Table D.SA Durbin-Watson d Statistic: Significance Points of d, and dat 0.05 Level of Significance 
Table D.5B Durbin-Watson d Statistic: Significance Points of d, and dy at 0.01 Levels of Significance 
Table D.6 Critical Values of Runs in the Runs Test 

Table D.7 1% and 5% Critical Dickey—Fuller t (= 7) and F Values for Unit Root Tests 


_ Appendices A, B, C, E and F can be accessed at: www.mhhe.com/sie-gujaratiSe 
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Table D.1 Areas Under the Standardized Normal Distribution 
Example 
Pr(O0 < Z < 1.96) = 0.4750 


Pr(Z > 1.96) = 0.5 — 0.4750 = 0.025 0.4750 
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.0120 
.0517 
.0910 
1293 
.1664 
.2019 


23937 
.2673 
.2967 
.3238 
.3485 


.3708 
.3907 
4082 
4236 
.4370 


4484 
4582 
.4664 
4732 
4788 


4834 
4871 
.4901 
4925 
4943 


4957 
.4968 
4977 
4983 
.4988 


10239 
.0636 
.1026 
.1406 
1772 
.2123 


.2454 
.2764 
.3051 
STS 
.3554 


.3770 
-3962 
.4131 
4279 
.4406 


4515 
.4608 
4686 
4750 
.4803 


4846 
.4881 
.4909 
4931 
.4948 


4961 
4971 
4979 
4985 
4989 


.0279 
.0675 
.1064 
.1443 
.1808 
.2157 


.2486 
.2794 
.3078 
.3340 
3677 


ore 
3980 
.4147 
4292 
.4418 


4525 
4616 
4693 
4756 
.4808 


.4850 
-4884 
4911 
.4932 
4949 


.4962 
4972 
.4979 
4985 
.4989 


0319 
.0714 
.1103 
.1480 
.1844 
.2190 


.2517 
.2823 
.3106 
.3365 
13599 


.3810 
.3997 
.4162 
.4306 
.4429 


4535 
4625 
.4699 
4761 
.4812 


.4854 
4887 
4913 
4934 
4951 


4963 - 
4973 
4980 
.4986 
.4990 


0359 
.0753 
.1141 
neues 
.1879 
.2224 


-2549 
.2852 
.3133 
.3389 
3621 


.3830 
.4015 
4177 
.4319 
.444] 


.4545 
.4633 
-4706 
.4767 
.4817 


.4857 
.4890 
.4916 
.4936 
4952 


4964 
4974 
4981 
.4986 
.4990 


Note: This table gives the area in the right-hand tail of the distribution (i.e., Z > 0). But since the normal distribution is 
symmetrical about Z = 0, the area in the left-hand tail is the same as the area in the corresponding nght-hand tail. For example. 
P(—1.96 < Z < 0) = 0.4750. Therefore, P(—1.96 < Z < 1.96) = 2(0.4750) = 0.95. 
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Table D.2 Percentage Points of the ż Distribution 


Example 
Pr(t > 2.086) = 0.025 
Pr(t>1.725)=0.05 _—fordf = 20 0.05 


Pr(it] > 1.725) = 0:10 


both tails. 
Source: From E. S. Pearson and H. O. Hartley, eds., Biometrika Tables for Statisticians, vol. 1, 3d ed., table 12, Cambridge 
University Press, New York, 1966. Reproduced by permission of the editors and trustees of Biometrika. 
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Table D.3 Upper Percentage Points of the F Distribution 
Example 

PEC PS 15970.25 

Pr(F >2.42)=0.10  fordf M =10 

Prtr = 3.14) = 0.05 and Nz =9 

Pr(ke= 5.26) = 0.01 


5% area 


1% area 


fee 
0 ` 3.14 5.26 
df for d 
denom- df for numerator N: 
inator ii = m Si 
N2 Pr 1 2 3 4 5 6 7 8 9 10 Ei. P 12 l 
.25 5.83 7.50 8.20 8.58 8.82 8.98 9.10 g9 9.26 9.32 9.36 9.41 
1 10 39.9 49.5 53.6 55.8 57.2 58.2 58.9 59.4 59.9 60.2 60.5 60.7 


.05 161 200: 216 225 230 234 237 239 241 242 243 244 


25 2.57 3.00 3.15 3.23 3.28 3.31 3.34 3.35 3.37 3.38 3.39 3.39 
2 10 8.53 9.00 9.16 9.24 2.29 933 9°35 937 9.38 959) 9.40 9.41 
: ; : : 19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4 
On 983 99.0 99.2 99.2 99.3 9913 99.4 99.4 99.4 99.4 99.4 99.4 


ee) 2.02 2.28 2.36 2.39 2.41 2.42 2.43 2.44 2.44 2.44 2.45 2.45 
3 10 5.54 5.46 5139 5.34 5.31 5.28 5.27 525 5.24 5.23 5.22 5:22 
.05 10.1 9.55 9,28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.76 8.74 
.01 341 30.8 29.5 28.7 28.2 27:9 27.7 27.5 27.3 27.2 27.1 #78 


25 1.81 2.00 2.05 2.06 2.07 2.08 2.08 2.08 2.08 2.08 2.08 2.08 
4 10 4,54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 . 3.94 392 3.91 3.90 
.05 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.94 5.91 
Eo 212 18.0 16.7 16.0 15.5 13.2 15.0 14.8 14.7 14.5 14.4 14.4 


25 1.69 1.85 1.88 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.89 
©) 10 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 3.28 3.27 
05 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.71 ° 4.68 
01 16.3 13.3 12.1 11.4 11.0 10.7 10.5 10.3 10.2 10.1 9.96 9.89 


.25 1.62 1.76 1.78 1.79 79 1.78 1.78 1.78 EZ ZZ 1.77 1.77 
6 10 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94 2.92 2.90 
.05 599 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06: 4.03 4.00 
01 13.7 10.9 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 72 


25 TSZ 1.70 12 1.72 1.71 Izi 1.70 1.70 1.69 1.69 1.69 1.68 
7 10 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2:72 2-70 2.68 2.67 
| 0S 5:59 4.74 4.35 4.12 3.97 3.87 3.79 23 3.68 3.64 3.60 357 
01 12.2 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.54 6.47 


25 1.54 1.66 1.67 1.66 1.66 1.65.. 1.64 1.64 1.63 1.63 1.63 1.62 
8 10 3.46 Salt 2197 2.81 2.73 2.67 2.62 2.59 2.56 2.54 2752 2.50 
.05 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.31 3.28 
e Yee! 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.73 5.67 


25 Mesi 1.62 1.63 1.63 1.62 1.61 1.60 1.60 1.59 1.59 1.58 1.58 
9 10 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 2.40 2.38 
05 302 4.26 3.86 3.63 3.48 3.37 3:29. 323 3.18 3.14 3.10 3.07 
.01 10.6 8.02 6.99 6.42 6.06 5.80 5.61 5.47 535 5.26 5.18 5.11 
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Source: From E. S. Pearson and H. O. Hartley, eds., Biometrika Tables for Statisticians. vol. 1, 3d ed., table 18. Cambridge University Press, New York, 1966. 
Reproduced by permission of the editors and trustees of Biometrika. 
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24 30 
9.58 9.63 9.67 9.71 
61.7 62.0 62.3 62.5 
248 249 250 251 
3.43 3.43 3.44 3.45 
9.44 9.45 9.46 9.47 
19.4 15 19.5 195 
99.4 99.5 99.5 99.5 
2.46 2.46 2.47 2.47 
5.18 5.18 517 "S316 
8.66 8.64 8.62 8.59 
26.7 26.6 26.5 26.4 
2.08 2.08 2.08 2.08 
3.84 3.83 3.82 3.80 
5.80 5.77 5.75 572 
14.0 13.9 13.8 13.7 
1.88 1.88 1.88 1.88 
3324 39 307 3.16 
4.56 4.53 4.50 4.46 
T55 9.47 9.38 9129 
1.76 1.75 1.75 EIS 
2.84 2.82 2.80 2.78 
3.87 3.84 3.81 3.77 
7.40 7.31 7.23 7.14 
1.67 1.67 1.66 1.66 
2:59 2.58 2.56 2.54 
3.44 3.41 3.38 3.34 
6.16 6.07 Syke, 5.91 
1.61 1.60 1.60 1.59 
2.42 2.40 2.38 2.36 
3515 352 3.08 3.04 
5.36 5.28 5.20 S2 
1.56 1.56 RSS 1.55 
2.30 2.28 2.25 223 
2.94 2.90 2.86 2.83 
4.81 4.73 4.65 


df for numerator N; 


50 


9.74 
62.7 
252 


3.45 

9.47 
19.5 
99.5 


2.47 

5.15 

8.58 
26.4 


2.08 

3.80 

5.70 
1357 


1.88 
3.15 
4.44 
9.24 


1.75 
277 
3.75 
7.09 


1.66 
252 
3.32 
5.86 


Appendix D: Statistical Tables 855 


60 100 120 200 
9.76 9.78 9.80 9.82 
62.8 63.0 63.1 63.2 
252 253 253 254 
3.46 3.47 3.47 3.48 
9.47 9.48 9.48 9.49 
19.5 5 19.5 199 
99.5 99.5 99.5 99.5 
2.47 2.47 2.47 2.47 
5.15 5.14 5.14 5.14 
8.57 8.55 8.55 8.54 
26.3 26.2 26.2 26.2 
2.08 2.08 2.08 2.08 
3.79 3.78 3.78 377 
5.69 5.66 5.66 5.65 
13.7 13.6 13.6 13.5 
1.87 1.87 1.87 1.87 
3.14 3m3 Belz 312 
4.43 4.41 4.40 4.39 
9.20 OAs 9.11 9.08 
1.74 1.74 1.74 1.74 
2.76 2.75 2.74 273 
3.74 3.71 3.70 3.69 
7.06 6.99 6.97 6.93 
1.65 1.65 1.65 1.65 
2.51 2.50 2.49 2.48 
3.30 3.27 327 3.25 
5.82 5.75 5.74 5.70 
1.59 1.58 1.58 1.58 
2.34 232 232 2.31 
3.01 2.97 297 2.95 
5.03 4.96 4.95 4.91 
1.54 153 153 m53 
ZAPA Zao 208 207 
2.79 2.76 275 2.73 
4.48 440 4.36 


500 


9.84 
63.3 
254 


3.48 

9.49 
1955 
995 


2.47 

5.14 

8.53 
26.1 


2.08 

3.76 

5.64 
1353 


1.87 
31H 
4.37 
9.04 


1.74 
273 
3.68 
6.90 


1.65 
2.48 
3.24 
5.67 


1.58 
2.30 
2.94 
4.88 


(3! 
207 
2.72 
4.33 


co 


9.85 
63.3 
254 


3.48 

9.49 
TS 
S95 


2.47 

S313 

8.53 
26.1 


2.08 

3.76 

5.63 
133 


1.87 
3.10 
4.36 
9.02 


1.74 
272 
3.67 
6.88 
1.65 


247, = 
575): 


5.65 


1.58 
229 
2.93 
4.86 


1.53 
2.16 
2.71 
4.31 


df for 
denom- 
inator 
Pr N2 
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Table D.3 Upper Percentage Points of the F Distribution (Continued) 


df for 


denom-/ df for numerator N; 
inator hm we 

N | Pr 1 2 3 4 5 6 Z 8 9 10 11 12 
[25 149 160 1.60 1.59 1.59 1.58 1.57 1.56 156 155 1.55 154 

| 10 329 292 273 261 252 246 241) 238 23 E E acer 

10 105 496 410 3.71 348 33300322. 314 307 3.02) | en) 204 nol 
01 10.0 7.56 655 5.99 564 539 5.20 5.06 494 4.85 4.77 471 

|25 447 1158 158 157 Tse 1:55 (1554) is Teo 

|.10 323 286 266 92:54 245 239 234 20 O22 mon 2 
"105 4s4 398 3.59 3.36 320 3.09 301 295 ©9290 Mees ee unas 
/01 965 7.21 622 567 532 5.07 489 474 463 4.54 446 4.40 

25 146 156 156 ass 1154 1.53 152 â 15i dist aso) o aes 

12 |D pis 281 261 24a 289 233 228 Gees 23 DU ea? Weis 
[05 475 389 349 326 311 300 291 285 280 275 272 269 

|.01 933 693 595 541 5.06 482 464 450 439 430 422 4.16 

|.25 45 155 155 153 152 Tsi -Tso 49 Wo ee? 

i [10 344 276 256 243 235 228 223 (220 Pi rE 
105 467 381 341 348 303 2:92 283 27 “271 “er To 

o 907 670 574 521 486 462 444 430 419 410 402 3.96 

|.25 144 153 153 152 151 150 149 qas ia 146 moe meee 

a {0 3HO 273° 252 239 2an 224 219 “ats Ee 
1.05 460 374 334 311 296 285 276 270 265 20o 2 253 

/.01 886 651 5.56 5.04 469 4.46 428 414 403 3.94 386 3.80 

125 143 52 152 1.51. 149° 148 147 ias hae e | ee ee 

15 [10 307 270 249 236 227 -221 216 212 -209 206 204 202 
1.05 454 368 329 306 290 279 271 264 259 254 251 248 

1.01 868 636 542 489 456 432 414 400 389 380 3.73 3.67 

(25. 142 Sl 151 1.50 1.48 1.47 1.46 145 Siaa a eee 

16 [10 305 267 246 233 224 218 213 209 206 203 201 1.99 
05 449 3.63 3.24 30 285 274 266 259 254 249 246 242 

"01 853 623 5.29 4.77 444 4.20 403 389 3.78 369 362 355 

|.25 142 1.51 150 149 147 146 145 1.44 143 «1.43 «+142 «141 

o [10-303 264 24 231 222 215 210 2% 20o 20 es lias 
(05 445 359 320 296 281 270 261 255 249 245, 241° 238 

0 840 611 5.18 467 434 410 393 3.79 368 359 352 346 

|.25 141 150 149 148 146 145 144 «1.43 142 «+4142 «+2141 ~ «1.40 

1g (J0 30 262 242 229 290 213 2o Dos bo mo n 
[5 441 355 316 293 277 266 253 Da Le 94 l E. 

|01 829 601 509 458 425 401 384 3.71 360 351 343 337 

[25 41 149 149 Way 146 1.44-9%43 dae alam ayes cae 

19 |10 299 261 240 227 218 211 206 202 m2m E Mir 
1.05 438 352 313 290 274 263 9954 248 24m2 ga eee 

a 818 593 5.01 450 417 394 397 363 (352009430 aa, Men 

25 140 149 148 146 145 144 143 142 +141 «+1440 139 1.39 

29 |70 297 259 2383 225 216 209 204 200 196 1.94 192 189 
[05 435 349 3.10 287 271 260 251 245 23 2355 2] oe 

(01 810 585 4.94 443 410 3.87 3.70 356 3.46 337 329 33 


24 30 40 50 
152 IS 151 1.50 
2.18 2.16 2313 202 
2.74 2.70 2.66 2.64 
433 4.25 4.17 4.12 
1.49 1.48 1.47 1.47 
2.10 2.08 2.05 2.04 
2.61 257 253 25] 
4.02 3.94 3.86 3.81 
1.46 1.45 1.45 1.44 
2.04 2.01 1.99 1.97 
2.51 2.47 2.43 2.40 
3.78 3.70 3:62 3:57 
1.44 1.43 1.42 1.42 
1.98 1.96 wS 192 
2.42 2.38 234 21 
3.59 3.51 343 3:38 
1.42 1.41 1.41 1.40 
1.94 191 1.89 1.87 
2.35 2.31 2.27 2.24 
3.43 325 3:27 322 
1.41 1.40 89 139 
1.90 1.87 1.85, 1.83 
229 225 2.20 2.18 
3929 3.21 3.13 3.08 
1.39 1.38 137 137 
1.87 1.84 1.81 1.79 
2.24 2.19 205 202 
3.18 3.10 302 297 
1.38 1.37 1.36 135 
1.84 1.81 1.78 1.76 
219 2515 2.10 2.08 
3.08 3.00 2.92. 2.87 
1.37 1.36 1.35 1.34 
1.81 1.78 15 1.74 
205. 2a 2.06 2.04 
3.00 2.92 2.84 2.78 
1.36 1.35 1.34 1.33 
E79 1.76 T73 71 
2.11 2.07 2.03 2.00 
2.92 2.84 276 271 
1.35 1.34 1s 238 
1.77 1.74 1.71 1.69 
2.08 2.04 UEL Yer 
2.86 2.78 2.69 2.64 


60 


5150 


2.11 
2.62 
4.08 


1.47 
2.03 
2.49 
3.78 


1.44 
1.96 
2.38 
3.54 


1.42 
1.90 
2.30 
3.34 


1.40 
1.86 
2.22 
3.18 


1.38 
1.82 
2.16 
3.05 


1.36 
1.78 
2m 
2.93 


1.35 
1.75 
2.06 
2.83 


1.34 
1.72 
2.02 
2.75 
1.33 
1.70 
1.98 
2.67 


1.32 
1.68 
1:253 
2.61 


df for numerator N; 


100 


1.49 
2.09 
2.59 
4.01 


1.46 
2.00 
2.46 
3.71 


1.43 
1.94 
235 
3.47 


1.41 
1.88 
2.26 
3.27 


139 
1.83 
2m9 
3.11 


1.38 
1.79 
2.12 
2.98 


1.36 
1.76 
2.07 
2.86 


1.34 
1.73 
2.02 
2.76 


133 
1.70 
1.98 
2.68 
1.32 
1.67 
1.94 
2.60 


1.31 
1.65 
1.91 
2.54 


120 


1.49 


2.08 
2.58 
4.00 


1.46 
2.00 
2.45 
3.69 


1.43 
T93 
2.34 
3.45 


1.41 
1.88 
2.25 
3.25 


1839 
1.83 
2.18 
3.09 
t37 
1.79 
2.11 
2.96 


1.35 
1.75 
2.06 
2.84 


1.34 
1.72 
2.01 
215 


1.33 
1.69 
1897 
2.66 
132 
1.67 
193 
2.58 


1.31 
1.64 
1.90 
2.52 
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| df for 
| denom- 


inator 
Pr N2 


200 500 oo 

1.49 1.48 1.48 
2.07 2.06 2.06 
2.56 2.55 2.54 
3.96 3393 3:91 
1.46 1.45 1.45 
1.99 1.98 1.97 
2.43 2.42 2.40 
3.66 3.62 3.60 
1.43 1.42 1.42 
ley w91 1.90 
2.32 231 230 
3.41 3.38 3.36 
1.40 1.40 1.40 
1.86 Tes ESS 
2.23 2,22. 22i 
3.22 319 Sie 
1.39 138 1:38 
1.82 1.80 1.80 
2.16 2.14 2:13 
3.06 3.03 3.00 
37 1.36 1.36 
1.77 1.76 1.76 
2.10 2.08 2.07 
2.92 2.89 2.87 
1:35 1.34 1.34 
1.74 173 172 
2.04 202 201 
2.81 238 275 
1.34 3 T3 
1-21 1.69 1.69 
1.99 1.97 1.96 
2.71 2.68 2.65 
1.32 132 12 
1.68 1.67 1.66 
1.95 193. 192 
2.62 2359) 257 
1.31 1.31 1.30 
1.65 1.64 1.63 
1.91 1.89 1.88 
2.55 2.51 2.49 
1.30 10 129 
1.63 1.62 1.61 
1.88 1.86 1.84 
2.48 2.44 2.42 


oH - 


10 
| 11 


I2 


15 


17 
18 
19 


| 
e 
n 
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Table D.3 Upper Percentage Points of the F Distribution (Continued) 


df for - 


denom-; df for numerator N; 
inator -—-—~~------— - ————- anion a -a 
N2 Pr 1 2 3 4 5 6 7 8 9 10 11 12 
~ a 140 148 +147 «+145 o 1.42 A o “Wee! o e 
„œ fo 295 256 235 202 293 205 201 197 We meo We wi 
22-050 4803.44 305 282 z266 "255 ae 2mo 23a 290 26 25 
[on zose 572 «482 481 399 i36 359 3m5 385 Ske ons Me 
l 139 ia wae i4 1s Ia 140o e9 Jee 6 ee DEA ee 
lo 293 254 233 219 210 204 198 194 191 188 185 183 
24705 426 340 301 278 262 251 242 236 230 225 221 218 
Loi 7:82 561 472 422 390 367 3160 336 326 a7 Gamo o 
2s wee 146 145 #4144 142 Tam GeO elas 157 Wey Mies. aes 
Lio 291 252 231 217 208 207 196 1.92 Wes 186 184 181 
76" (05 423 337 298 274 259 247 239 232 27 22 œk i 
OT 7:72 5:53 464 414 382 359 342 329 318 309 302 296 
125 1438 146 145 143 141 140 139 138 137 136 135 134 
s i10 289 250 229 216 206 200 194 190 187 184 181 1.79 
1.05 420 3.34 295 271 256 245 236 229 224 219 215 212 
Jol 7.64 545 457 407 375 353 3136 323 342 3103 %96 200 
(25 138 145 144 142 141 139 1.38 1.37 136 135 135 134 
39 J0 288 249 2283 2ī4 205 198 193 1a qes ie mo T 
105 417 332 292 269 253 242 233 227 221 216 «213 209 
01 7.56 5.39 4.51 402 370 347 330 317 307 298 291 284 
1.25 136 144 142 140 139 137 136 ibs 164 qB 4S2 6S 
a [a0 284 244 223 209 200 93 Wez 183 Wvo We "3 Wi 
l O5 408 323 284 261 245 234 225 218 212 208 204 200 
[OI 7.31 S18 431 3.83 351 3.29 312 299 289 280 273 266 
1.25 135 42 141 138 4137 «135 s Ia 131 «130 «129 «129 
æ 10 279 239 238 204 195 187 182 17 au 1 m H 
1.05 400 315 276 253 237 225 217 210 204 199 195 192 
[01 7.08 498 413 365 3.34 312 295 282 272 263 256 250 
|25 134 140 139 1.37 1.35 1.33 1.31 130 1.29 1.28 127 11.26 
wo "O eC o E 
|05 3.92 3.07 268 245 229 217 209 202 196 19 187 183 
or 685 479° 395 348 3.17 296 279 266 256 247 240 234 
25° 133° 139 138 1.36 W34 132 13) 1%- w28 17 Ss 
30 70 273 233 201 17 we qdo dms wo moo oe 4 
|05 389 304 265 242 226 244 206 we ea mo ~% Glen 
| 01 676 471 388 341 3.11 289 273° 260 250 24) 234 297 
25 waz wmo 137 185 Ue Gr GO Gee gay ae “Gee m 
S | 10 271 230 208 1% es 17v “ae “te, “ames o e S 
|05 384 300 260 237 221 210 20 164 Ges is W ies 
[01 663 461 3.78 3.32 302 280 264 251 241 232 225 218 


15 20 24 30 40 50 


1.36 1.34 1.33 132 esti JE 
1.81 1.76 Nez 1.70 1.67 1.65 
ZS 2.07 2.03 1.98 1.94 1.91 
2.98 2.83 2.75 267 2.58 2.53 


1.35 1.33 1.32 1.31 1.30 1.29 
1.78 1.73 1.70 1.67 1.64 1.62 
211 2.03 1.98 1.94 1.89 1.86 
2.89 274 2.66 2.58 2.49 2.44 


1.34 1.32 ian 1.30 C2928 
1.76 1.71 1.68 1.65 1.61 1.59 
2.07 1.99 1.95 1.90 1.85 1.82 
2.81 2.66 2.58 2.50 2.42 2.36 


1.33 1.31 1.30 1.29 1.28 1.27 
1.74 1.69 1.66 1.63 isk) 1-57 
2.04 1.96 1.91 1.87 1.82 1.79 
2.75 2.60 2.52 2.44 2.35 2.30 


1.32 1.30 129 1.28 T2726 
172 1.67 1.64 1.61 U IS 
2.01 1.93 1.89 1.84 1.79 1.76 
2.70 2.55 2.47 2.39 2.30 2.25 


1.30 1.28 1.26 1.25 1.24 1.23 
1.66 1.61 1.57 1.54 1.51 > 1.48 


60 


1.30 
1.64 
1.89 
2.50 


1.29 
1.61 
1.84 
2.40 


1.28 
1.58 
1.80 
2.33 


1.27 
1.56 
eae 
2.26 


1.26 
1.54 
1.74 
221 


122 
1.47 
1.64 
2.02 


119 
1.40 
T.53 
1.84 


1.16 
T32 
1.43 
1.66 


elles 
1.28 
1.39 
1.58 


112 
1.24 
132 
1.47 


df for numerator N; 


100 
1.30 
1.61 


1.85 
2.42 


1.28 
1.58 


120 


1.30 
1.60 
1.84 
2.40 


1.28 
1S7 
T79 
2.31 
1.26 
1.54 
1.75 
2.23 


1.25 
152 
1.71 
2.17 


1.24 
1.50 
1.68 
2.11 
1.21 
1.42 
1.58 
192 


Use 
1.35 
1.47 
ies} 


1.13 
1.26 
135) 
1.53 


1.10 
i22 
129 
1.44 
1.08 


1.17 
22 


1.32 
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| df for 
| denom- 
— ——— inator 
500 oo iPr | N; 
29 eet PGi? 5u9) 
1.58 1.57 .10 
1.80 1.78 05 | 22 
2332] Ol 
1.27 126 25l 
1.54 1.53 10 | 
a 0seeee 
2.24 221 01 | 
125 se. 25: | 
151 150 .10 
171 1.69 05 | 26 
2.16 213 01 
1.24 1.24 25 | 
1.49 1.48 .10 | 
1.67 165 05| 7 
2.09 206 .01 
123023 25 | 
1.47 146 10 
1.64 162 05 | 2 
2.03 2.01 0 
1.19 1.19 25 
1.39 1.38.10 | 
1.53 ey 0s 
1.83 1.80 .01 | 
115 115 25 | 
1.31 1.29 .10 | 
141 139 5 | 6 
1.63 1.60 01 | 
111 110 25 | 
1.21 199 T 
128 125 ose 
1.42 138 01 | 
1.08 1.06 25 | 
1.17 114 .10 | 
122 119 o5 | 200 
1.33 1.28 0t | 
1.04 1.00 .25 
oo 
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Table D.4 Upper Percentage Points of the x’ Distribution 

Example 25% area 
Pr(x2 > 10.85) = 0.95 95% area 

Pr(x? > 23.83) =0.25 fordf= 20 
Pr (x? > 31.41) = 0.05 


O 10.85 23.83 31.41 


5 
Degrees \ Pr 


of freedom | .995 .990 .975 950 .900 
1 |392704 x 10-19 157088 x 10-? 982069 x 10-? 393214 x 10-8 .0157908 
2 .0100251 .0201007 .0506356 .102587 .210720 
3 | 0717212 -114832 215795 .351846 .584375 
4 | .206990 .297110 484419 710721 1.063623 
5 | .411740 .554300 831211 1.145476 1.61031 
6 | .675727 -872085 1.237347 1.63539 2.20413 
7 | .989265 1.239043 1.68987 2.16735 2.83311 
8 1.344419 1.646482 2.17973 2.73264 3.48954 
9 | 1.734926 2.087912 2.70039 3.32511 4.16816 
10 | 2.15585 2.55821 3.24697 3.94030 4.86518 
11 | 2.60321 3.05347 3.81575 4.57481 5.57779 
12 | 3.07382 3.57056 4.40379 5.22603 6.30380 
iss | 3.56503 4.10691 5.00874 5.89186 7.04150 
14 | 4.07468 4.66043 5.62872 6.57063 © 7.78953 
15 | 4.60094 5.22935 6.26214 7.26094 8.54675 
16 5.14224 5.81221 6.90766 7.96164 9.31223 
17 5.69724 6.40776 7.56418 8.67176 10.0852 
18 6.26481 7.01491 8.23075 9.39046 10.8649 
19 | 6.84398 7.63273 8.90655 10.1170 11.6509 
20 | 7.43386 8.26040 9.59083 10.8508 12.4426 
21 | 8.03366 8.89720 10.28293 11.5913 13.2396 
22 8.64272 9.54249 10.9823 12.3380 14.0415 
23 | 9.26042 10.19567 11.6885 13.0905 14.8479 
24 | 9.88623 10.8564 12.4011 13.8484 15.6587 
25 | 10.5197 11.5240 13:1197 14.6114 16.4734 
26 11.1603 12.1981 13.8439 15.3791 17.2919 
27 | 11.8076 12.8786 14.5733 16.1513 18.1138 
28 | 12.4613 13.5648 15.3079 -- 16.9279 18.9392 
29 13.1211 14.2565 16.0471 ~ 17.7083 19.7677 
30 13.7867 14.9535 16.7908 18.4926 20.5992 
40 20.7065 22.1643 24.4331 26.5093 29.0505 
50 27.9907 29.7067 32.3574 ` 34.7642 37.6886 
60 35.5346 37.4848 40.4817 43.1879 46.4589 
70 43,2752 45.4418 48.7576 51.7393 55.3290 
80 51.1720 53.5400 57.1532 60.3915 64.2778 
90 59.1963 61.7541 65.6466 69.1260 73.2912 
100* 67.3276 70.0648 74.2219 77.9295 82.3581 
Te a a TAN a 
*For df greater than 100 the expression Vx — V(2k — 1) = Z follows the standardized normal distribution, where k represents 


the degrees of freedom. 


750 


-1015308 

.575364 
1.212534 
1.92255 


2.67460 
3.45460 
4.25485 
5.07064 
5.89883 


6.73720 
7.58412 
8.43842 
9.29906 
10.1653 


11.0365 
11.9122 
12.7919 
13.6753 
14.5620 


15.4518 
16.3444 
17.2396 
18.1373 
19.0372 


19.9393 
20.8434 
21.7494 
22.6572 
23.5666 


24.4776 
33.6603 
42.9421 
52.2938 


61.6983 
71.1445 
80.6247 
90.1332 
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.010 


-500 .250 -100 .050 .025 .005 
454937 1.32330 2.70554 3.84146 5.02389 6.63490 7.87944 
1.38629 2.77259 4.60517 5.99147 7.37776 9.21034 10.5966 
2.36597 4.10835 6.25139 7.81473 9.34840 11.3449 12.8381 
3.35670 5.38527 7.77944 9.48773 11.1433 13.2767 14.8602 
4.35146 6.62568 9.23635 11.0705 12.8325 15.0863 16.7496 
5.34812 7.84080 10.6446 12.5916 - 14.4494 16.8119 18.5476 
6.34581 9.03715 12.0170 14.0671 16.0128 18.4753 20.2777 
7.34412 10.2188 13.3616 15.5073 17.5346 20.0902 21.9550 
8.34283 11.3887 14.6837 16.9190 19.0228 21.6660 23.5893 
9.34182 12.5489 15.9871 18.3070 20.4831 23.2093 25.1882 
10.3410 13.7007 17.2750 19.6751 21.9200 24.7250 26.7569 
11.3403 14.8454 18.5494 21.0261 23.3367 26.2170 28.2995 
12.3398 15.9839 19.8119 22.3621 24.7356 27.6883 29.8194 
13.3393 17.1170 21.0642 23.6848 26.1190 29.1413 31.3193 
14.3389 18.2451 22.3072 24.9958 27.4884 30.5779 32.8013 
15.3385 19.3688 23.5418 26.2962 28.8454 31.9999 34.2672 
16.3381 20.4887 24.7690 27.5871 30.1910 33.4087 35.7185 
17.3379 21.6049 25.9894 28.8693 31.5264 34.8053 37.1564 
18.3376 22.7178 27.2036 30.1435 32.8523 36.1908 38.5822 
19.3374 23.8277 28.4120 31.4104 34.1696 37.5662 39.9968 
20.3372 24.9348 29.6151 32.6705 35.4789 38.9321 41.4010 
21.3370 26.0393 30.8133 33.9244 36.7807 40.2894 42.7956 
22.3369 27.1413 32.0069 35.1725 38.0757 41.6384 44.1813 
23.3367 28.2412 33.1963 36.4151 39.3641 42.9798 45.5585 
24.3366 29.3389 34.3816 37.6525 40.6465 44.3141 46.9278 
25.3364 30.4345 35.5631 38.8852 41.9232 45.6417 48.2899 
26.3363 31.5284 36.7412 40.1133 43.1944 46.9630 49.6449 
27.3363 32.6205 37.9159 41.3372 44.4607 48.2782 50.9933 
28.3362 33.7109 39.0875 42.5569 45.7222 49.5879 52.3356 
29.3360 34.7998 40.2560 43.7729 46.9792 50.8922 53.6720 
39.3354 45.6160 51.8050 55.7585 59.3417 63.6907 66.7659 
49.3349 56.3336 63.1671 67.5048 71.4202 76.1539 79.4900 
59.3347 66.9814 74.3970 79.0819 83.2976 88.3794 SISI 
69.3344 77.5766 85.5271 90.5312 95.0231 100.425 104.215 
79.3343 88.1303 96.5782 101.879 106.629 112.329 116.321 
89.3342 98.6499 107.565 113.145 118.136 124.116 128.299 
99.3341 109.141 118.498 124,342 129.561 135.807 140.169 


Source: Abridged from E. S. Pearson and H. O. Hartley, eds., Biometrika Tables for Stansticians, vol. 1, 3d ed , table 8, Cambridge University Press. New York, 1966 


Reproduced by permission of the editors and trustees of Biometrika. 
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Table D.5A Durbin—Watson d Statistic: Significance Points of d and dy at 0.05 Level of Significance 


K=1 k=2 k=3 k= K=5 K=6 K=7 k=8 k=9 K=10 


n dı dy dı dy dı du di dy di dy dı dy dı dy dı dy d: dy p dı dy 


6 0.610 1.400 — -— = =e => = = = =e = S ran =e a Ti = z z= 
7 0.700 1.356 0467 1.896 — = — = = = = = = = = = = =, = g 
8 0.763 1.332 0.559 1.777 0.368 2.287 — = = = = = =a E= a pa k T -7 J 
9 0.824 1.320 0.629 1.699 0.455 2.128 0.296 2.588 — = = = Ez a = T T = as — 
10 0.879 1.320 0.697 1.641 0.525 2.016 0.376 2.414 0.243 2.822 — = = = a a= = = = 
11 0.927 1.324 0.658 1.604 0.595 1.928 0.444 2.283 0.316 2.645 0.203 3.005 — — — — ` — — — — 
12 0.971 1.331 0.812 1.579 0.658 1.864 0.512 2.177 0.379 2.506 0.268 2.832 0.171 3.149 — — — — _ 
13 1.010 1.340 0.861 1.562 0.715 1.816 0.574 2.094 0.445 2.390 0.328 2.692 0.230 2.985 0.147 3.266 — = — — 
14 1.045 1.350 0.905 1.551 0.767 1.779 0.632 2.030 0.505 2.296 0.389 2.572 0.286 2.848 0.200 3.111 0.127 3.360 — = 
15 1.077 1.361 0.946 1.543 0.814 1.750 0.685 1.977 0.562 2.220 0.447 2.472 0.343 2.727 0.251 2.979 0.175 3.216 0.111 3.438 
16 1.106 1.371 0.982 1.539 0.857 1.728 0.734 1.935 0.615 2.157 0.502 2.388 0.398 2.624 0.304 2.860 0.222 3.090 0,155 3.304 
17 1.133 1.381 1.015 1.536 0,897 1.710 0.779 1.900 0.664 2.104 0.554 2.318 0.451 2.537 0.356 2.757 0.272 2.975 0.198 3.184 
18 1.158 1.391 1.046 1.535 0.933 1.696 0.820 1.872 0.710 2.060 0.603 2.257 0.502 2.461 0407 2.667 0.321 2.873 0.244 3.073 
19 1.180 1.401 1.074 1.536 0.967 1.685 0.859 1.848 0.752 2.023 0.649 2.206 0.549 2.396 0.456 2.589 0.369 2.783 0.290 2.974 
20 1.201 1.411 1.100 1.537 0.998 1.676 0.894 1.828 0.792 1.991 0.692 2.162 0.595 2.339 0.502 2.521 0.416 2.704 0.336 2.885 
21 1.221 1,420 1.125 1.538 1,026 1.669 0.927 1.812 0.829 1.964 0.732 2.124 0.637 2.290 0.547 2.460 0.461 2.633 0.380 2.806 
22 1.239 1.429 1.147 1.541 1.053 1.664 0.958 1.797 0.863 1.940 0.769 2.090 0.677 2.246 0.588 2.407 0.504 2.571 0.424 2.734 
23 1.257 1.437 1.168 1.543 1.078 1.660 0.986 1.785 0.895 1.920 0.804 2.061 0.715 2.208 0.628 2.360 0.545 2.514 0.465 2.670 
24 1.273 1446 1.188 1.546 1.101 1.656 1.013 1.775 0.925 1.902 0.837 2.035 0.751 2.174 0.666 2.318 0.584 2.464 0.506 2.613 
25 1.288 1.454 1.206 1.550 1.123 1.654 1.038 1.767 0.953 1.886 0.868 2.012 0.784 2.144 0.702 2.280 0.621 2.419 0.544 2.560 
26 1,302 1.461 1.224 1.553 1.143 1.652 1.062 1.759 0.979 1.873 0.897 1.992 0.816 2.117 0.735 2.246 0.657 2.379 0.581 2.513 
27 1.316 1.469 1.240 1.556 1.162 1.651 1.084 1.753 1.004 1.861 0.925 1.974 0.845 2.093 0.767 2.216 0.691 2.342 0.616 2.470 
28 1.328 1.476 1,255 1.560 1.181 1.650 1.104 1.747 1.028 1.850 0.951 1.958 0.874 2.071 0.798 2.188 0.723 2.309 0.650 2.431 
29 1.341 1.483 1.270 1.563 1.198 1.650 1.124 1.743 1.050 1.841 0.975 1.944 0.900 2.052 0.826 2.164 0.753 2.278 0.682 2.396 
30 1.352 1.489 1.284 1.567 1.214 1.650 1.143 1.739 1.071 1.833 0.998 1.931 0.926 2.034 0.854 2.141 0.782 2.251 0.712 2.363 
31 1.363 1,496 1.297 1.570 1.229 1.650 1.160 1.735 1.090 1.825 1.020 1.920 0.950 2.018 0.879 2.120 0.810 2.226 0.741 2.333 
32 1.373 1.502 1.309 1.574 1.244 1.650 1.177 1.732 1.109 1.819 1.041 1.909 0.972 2.004 0.904 2.102 0.836 2.203 0.769 2.306 
33 1.383 1.508 1.321 1.577 1.258 1.651 1.193 1.730 1.127 1.813 1.061 1.900 0.994 1.991 0.927 2.085 0.861 2.181 0.795 2.281 
34 1.393 1.514 1.333 1.580 1.271 1.652 1.208 1.728 1.144 1.808 1.080 1.891 1.015 1.979 0.950 2.069 0.885 2.162 0.821 2.257 
35 1.402 1.519 1.343 1.584 1.283 1.653 1.222 1.726 1.160 1.803 1.097 1.884 1.034 1.967 0.971 2.054 0.908 2.144 0.845 2.236 
36 1.411 1.525 1.354 1.587 1.295 1,654 1.236 1.724 1.175 1.799 1.114 1.877 1.053 1.957 0.991 2.041 0.930 2.127 0.868 2.216 
37 1.419 1.530 1.364 1.590 1.307 1.655 1.249 1.723 1.190 1.795 1.131 1.870 1.071 1.948 1.011 2.029 0.951 2.112 0.891 2.198 
38 1.427 1.535 1.373 1.594 1.318 1.656 1.261 1.722 1.204 1.792 1.146 1.864 1.088 1.939 1.029 2.017 0.970 2.098 0.912 2.180 
39 1.435 1.540 1.382 1.597 1.328 1.658 1.273 1.722 1.218 1.789 1.161 1.859 1.104 1.932 1.047 2.007 0.990 2.085 0.932 2.164 
40 1.442 1.544 1.391 1.600 1.338 1.659 1.285 1.721 1.230 1.786 1.175 1.854 1.120 1.924 1.064 1.997 1.008 2.072 0.952 2.149 
45 1.475 1.566 1.430 1.615 1.383 1.666 1.336 1.720 1.287 1.776 1.238 1.835 1.189 1.895 1.139 1.958 1.089 2.022 1.038 2.088 
50 1.503 1.585 1.462 1.628 1.421 1.674 1.378 1.721 1.335 1.771 1.291 1.822 1.246 1.875 1.201 1.930 1.156 1.986 1.110 2.044 
55 1.528 1.601 1.490 1.641 1.452 1.681 1.414 1.724 1.374 1.768 1.334 1.814 1.294 1.861 1.253 1.909 1.212 1.959 1.170 2.010 
60 1.549 1.616 1.514 1.652 1.480 1.689 1.444 1.727 1.408 1.767 1.372 1.808 1.335 1.850 1.298 1.894 1.269 1.939 1.222 1.984 
65 1.567 1.629 1.536 1.662 1.503 1.696 1.471 1.731 1.438 1.767 1.404 1.805 1.370 1.843 1.336 1.882 1.301 1.923 1.266 1.964 
70 1.583 1.647 1.554 1.672 1.525 1.703 1.494 1.735 1.464 1.768 1.433 1.802 1.401 1.837 1.369 1.873 1.337 1.910 1.305 1.948 
75 1.598 1,652 1.571 1.680 1.543 1.709 1.515 1.739 1.487 1.770 1.458 1.801 1.428 1.834 1.399 1.867 1.369 1.901 1.339 1.935 
80 1.611 1.662 1.586 1.688 1.560 1.715 1.534 1.743 1.507 1.772 1.480 1.801 1.453 1.831 1.425 1.861 1.397 1.893 1.369 1.925 
85 1.624 1.671 1.600 1.696 1.575 1.721 1.550 1.747 1.525 1.774 1.500 1.801 1.474 1.829 1.448 1.857 1.422 1.886 1.396 1.916 
90 1.635 1.679 1.612 1.703 1.589 1.726 1.566 1.751. 1.542 1.776 1.518 1.801 1.494 1.827 1.469 1.854 1.445 1.881 1.420 1.909 
95 1.645 1.687 1.623 1.709 1.602 1.732 1.579 1.755 1.557 1.778 1.535 1.802 1.512 1.827 1.489 1.852 1.465 1.877 1.442 1.903 
100 1.654 1.694 1.634 1.715 1.613 1.736 1.592 1.758 1.571 1.780 1.550 1.803 1.528 1.826 1.506 1.850 1.484 1.874 1.462 1.898 
150 1.720 1.746 1.706 1.760 1.693 1.774 1.679 1.788 1.665 1.802 1.651 1.817 1.637 1.832 1.622 1.847 1.608 1.862 1.594 1.877 
200 1.758 1.778 1.748 1.789 1.738 1.799 1.728 1.810 1.718 1.820 1.707 1.831 1.697 1.841 1.686 1.852 1.675 1.863 1.665 1.874 


n 


16 
17 
18 
19 


200 


d 


0.098 
0.138 
0.177 
0.220 
0.263 
0.307 
0.349 
0.391 
0.431 
0.470 
0.508 
0.544 
0.578 
0.612 
0.643 
0.674 
0.703 
0.731 
0.758 
0.783 
0.808 
0.831 
0.854 
0.875 
0.896 
0.988 
1.064 
1.129 
1.184 
1.231 
1.272 
1,308 
1.340 
1,369 
1.395 
1.418 
1.439 
1.579 
1.654 


dy 


3,503 


3.378 
3.265 
3.159 
3.063 
2.976 
2.897 
2.826 
2.761 
2.702 
2.649 
2.600 
2.555 
2.515 
2.477 
2.443 
2.411 
2.382 
2.355 
2.330 
2.306 
2.285 
2.265 
2.246 
2.228 
2.156 
2.103 
2.062 
2.031 
2.006 
1.986 
1.970 
1.957 
1.946 
1.937 
1.929 
1.923 
1.892 
1.885 


0.087 
0.123 
0.160 
0.200 
0.240 
0.281 
0.322 
0.362 
0.400 
0.438 
0.475 
0.510 
0.544 
0.577 
0.608 
0.638 
0.668 
0.695 
0.722 
0.748 
0.772 
0.796 
0.819 
0.840 
0.938 
1.019 
1.087 
1.145 
1.195 
1.239 
1.277 
1.311 
1,342 
1.369 
1.394 
1.416 
1.564 
1.643 


3.557 
3.441 
3.335 
3.234 
3.141 
3.057 
2.979 
2.908 
2.844 
2.784 
2.730 
2.680 
2.634 
2.592 
2.553 
2.517 
2.484 
2.454 
2.425 
2.398 
2.374 
2.351 
2.329 
2.309 
2.225 
2.163 
2.116 
2.079 
2.049 
2.026 
2.006 
1.991 
WIIF 
1.966 
1.956 
1.948 
1.908 
1.896 


0.078 
0.111 
0.145 
0.182 
0.220 
0.259 
0.297 
0.335 
0.373 
0.409 
0.445 
0.479 
0.512 
0.545 
0.576 
0.606 
0.634 
0.662 
0.689 
0.714 
0.739 
0.763 
0.785 
0.887 
0.973 
1.045 
1.106 
1.160 
1.206 
1.247 
1.283 
1.315 
1.344 
1.370 
1.393 
1.550 
1.632 


3.603 
3.496 
3.395 
3.300 
3.211 
3.128 
3.053 
2.983 
2919 
2.859 
2.805 
2.755 
2.708 
2.665 
2.625 
2.588 
2.554 
2.521 
2.492 
2.464 
2.438 
2.413 
2.391 
2.296 
2.225 
2.170 
2.127 
2.093 
2.066 
2.043 
2.024 
2.009 
1.995 
1.984 
1.974 
1.924 
1.908 


0.070 
0.100 
0.132 
0.166 
0.202 
0.239 
0.275 
0.312 


0.348 


0.383 
0.418 
0.451 
0.484 
0.515 
0.546 
0.575 
0.604 
0.631 
0.657 
0.683 
0.707 
0.731 
0.838 
0.927 
1.003 
1,068 
1.124 
1,172 
1.215 
1.253 
1.287 
1.318 
1.345 
1.371 
1.535 
1.621 


3.642 
3.542 
3.448 
3.358 
3.272 
3.193 
3.119 
3.051 
2.987 
2.928 
2.874 
2.823 
2.776 
2.733 
2.692 
2.654 
2.619 
2.586 
2.555 
2.526 
2.499 
2.473 
2.367 
2.287 
2.225 
2.177 
2,138 
2.106 
2.080 
2.059 
2.040 
2.025 
2.012 
2.000 
1.940 
1.919 


0.063 
0.091 
0.120 
0.153 
0.186 
0.221 
0.256 
0.291 
0.325 
0.359 
0.392 
0.425 
0.457 
0.488 
0.518 
0.547 
0.575 
0.602 
0.628 
0.653 
0.678 
0.788 
0.882 
0.961 
1.029 
1.088 
1.139 
1.184 
1.224 
1.260 
1.292 
1.321 
1.347 
1.519 
1.610 


3.676 
3.583 
3.495 
3.409 
3,327 
3.251 
3.179 
3.4112 
3.050 
2.992 
2.937 
2.887 
2.840 
2.796 
2.754 
2.716 
2.680 
2.646 
2.614 
2,585 
2.557 
2.439 
2.350 
2.281 
2.227 
2.183 
2.148 
2.118 
2,093 
2.073 
2.055 
2.040 
2.026 
1.956 
1.931 


0.058 
0.083 
0.110 
0.141 
0.172 
0.205 
0.238 
0.271 
0.305 
0.337 
0.370 
0.401 
0.432 
0.462 
0.492 
0.520 
0.548 
0.575 
0.600 
0.626 
0.740 
0.836 
0.919 
0.990 
1.052 
1.105 
1.153 
1.195 
1.232 
1.266 
1,296 
1.324 
1.504 
1.599 


3.705 
3.619 
3.535 
3.454 
3.376 
3.303 
3.233 
3.168 
3.107 
3.050 
2.996 
2.946 
2.899 
2.854 
2.813 
2.774 
2.738 
2.703 
2.671 
2.641 
2.512 
2.414 
2.338 
2.278 
2.229 
2.189 
2.156 
2.129 
2.105 
2.085 
2.068 
2.053 
1.972 
1.943 


0.052 
0.076 
0.101 
0.130 
0.160 
0.191 
0.222 
0.254 
0.286 
0.317 
0.349 
0.379 
0.409 
0.439 
0.467 
0.495 
0.522 
0.549 
0.575 
0.692 
0.792 
0.877 
0.951 
1.016 
1.072 
1.121 
1.165 
1.205 
1.240 
1.271 
1.301 
1,489 
1.588 


Note: n = number of observations, k’ = number of explanatory variables excluding the constant term. 
Source This table 1s an extension of the original Durbin- Watson table and 1s reproduced from N F Savinand K J White “The Durbin-Watson Test for Semnal Correlation 
with Extreme Small Samples or Many Regressors,” Econometrica, vol. 45, November 1977, pp. 1989-96 and as corrected by R. W. Farebrother. Econometrica, vol. 48, 

September 1980, p. 1554. Reprinted by permission of the Econometric Society. 
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3.731 
3.650 
3.572 
3.494 
3.420 
3.349 
3.283 
3.219 
3.160 
3.103 
3.050 
3.000 
2.954 
2.910 
2.868 
2.829 
2.792 
2.757 
2.724 
2.586 
2.479 
2.396 
2.330 
2.276 
2.232 
2.195 
2.165 
2.139 
2.116 
2.097 
2.080 
1.989 
1.955 


3.773 
3.702 
3.632 
3.563 
3.495 
3.431 
3.368 
3.309 
3.252 
3.198 
3.147 
3,099 
3.053 
3.009 
2.968 
2.929 
2.892 
2.733 
2.610 
2.512 
2.434 
2.371 
2.318 
2.275 
2.238 
2.206 
2.179 
2.156 
2.135 
2.023 
1.979 


0.041 
0.060 
0.081 
0.104 
0.129 
0.156 
0.183 
0.211 
0.239 
0.267 
0.295 
0.323 
0.351 
0.378 
0.404 
0.430 
0.553 
0.660 
0.754 
0.836 
0.908 
0.971 
1.027 
1.076 
1,121 
1.160 
1.197 
1.229 
1.443 
1,554 


3.790 
3.724 
3.658 
3.592 
3.528 
3.465 
3.406 
3.348 
3.293 
3.240 
3.190 
3.142 
3.097 
3.054 
3.013 
2.974 
2.807 
2.675 
2.571 

2.487 
2.419 
2.362 
2.315 
2.275 
2.241 

2.211 

2.186 
2.164 
2.040 
1.991 


Example 1 
If n= 40 and k' = 4, d, = 1.285 and dy = 1.721. lf a computed d value is less than 1.285, there is evidence of 
positive first-order seriai correlation; if it is greater than 1.721, there is no evidence of positive first-order serial 


correlation; but if d lies between the lower and the upper limit, there is inconclusive evidence regarding the 
presence or absence of positive first-order serial correlation. 
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Table D.5B = Durbin—Watson d Statistic: Significance Points of d; and 4 


,at 0.01 Level of Significance 


k=1 K=2 k=3 K=4 =S k=6 Kk =7 k=8 k=9 kK =10 
n dı dy dı dy dı dy dı dy di dy dı dy dı du dı du dı du dı du 
6 0.390 1.142 = = = = rae a aa z = E an 
7 0.435 1.036 0.294 1.676 = = = = z= 
8 0.497 1.003 0.345 1.489 0.229 2.102 — — — = = = = = T = F a F E 
9 0.554 0.998 0.408 1.389 0.279 1.875 0.183 2.433 — = = = = = a = = F E a 
10 0.604 1.001 0.466 1.333 0.340 1.733 0.230 2.193 0.150 2.690 — — — = m a = s = a 
11 0.653 1.010 0.519 1.297 0,396 1.640 0.286 2.030 0.193 2.453 0.124 2.892 — — — — = — = TAF 
12 0.697 1.023 0.569 1.274 0.449 1.575 0.339 1.913 0.244 2.280 0.164 2.665 0.105 3.053 — — — — — — 
13 0.738 1.038 0.616 1.261 0.499 1.526° 0.391 1.826 0.294 2.150 0.211 2.490 0.140 2.838 0.090 3.182 — — — — 
14 0.776 1.054 0.660 1.254 0.547 1.490 0.441 1.757 0.343 2.049 0.257 2.354 0.183 2.667 0.122 2.981 0.078 3.287 — —- 
15 0.811 1.070 0.700 1.252 0.591 1.464 0.488 1.704 0.391 1.967 0.303 2.244 0.226 2.530 0.161 2.817 0.107 3.101 0.068 3.374 
16 0.844 1.086 0.737 1.252 0.633 1.446 0.532 1.663 0.437 1.900 0.349 2.153 0.269 2.416 0.200 2.681 0.142 2.944 0.094 3.201 
17 0.874 1.102 0.772 1.255 0.672 1.432 0.574 1.630 0.480 1.847 0.393 2.078 0.313 2.319 0.241 2.566 0.179 2.811 0.127 3.053 
18 0.902 1.118 0.805 1.259 0.708 1.422 0.613 1.604 0,522 1.803 0.435 2.015 0.355 2.238 0.282 2.467 0.216 2.697 0.160 2.925 
19 0.928 1.132 0.835 1.265 0.742 1.415 0.650 1.584 0.561 1.767 0.476 1.963 0.396 2.169 0.322 2.381 0.255 2597 0.196 2.813 
20 0.952 1.147 0.863 1.271 0.773 1.411 0.685 1.567 0.598 1.737 0.515 1.918 0.436 2.110 0.362 2.308 0.294 2.510 0.232 2.714 
21 0.975 1.161 0.890 1.277 0.803 1.408 0.718 1.554 0.633 1.712 0.552 1.881 0.474 2.059 0.400 2.244 0.331 2.434 0.268 2.625 
22 0.997 1.174 0.914 1.284 0.831 1.407 0.748 1.543 0.667 1.691 0.587 1.849 0.510 2.015 0.437 2.188 0.368 2.367 0.304 2.548 
23 1.018 1.187 0.938 1.291 0.858 1.407 0.777 1.534 0.698 1.673 0.620 1.821 0.545 1.977 0.473 2.140 0.404 2.308 0.340 2.479 
24 1.037 1.199 0.960 1.298 0.882 1.407 0.805 1.528 0.728 1.658 0.652 1.797 0.578 1.944 0.507 2.097 0.439 2.255 0.375 2.417 
25 1.055 1.211 0.981 1.305 0.906 1.409 0.831 1.523 0.756 1.645 0.682 1.776 0.610 1.915 0.540 2.059 0.473 2.209 0409 2.362 
26 1.072 1.222 1.001 1.312 0.928 1.411 0.855 1.518 0.783 1.635 0.711 1.759 0.640 1.889 0.572 2.026 0.505 2.168 0441 2.313 
27 1.089 1.233 1.019 1.319 0.949 1.413 0.878 1.515 0.808 1.626 0.738 1.743 0.669 1.867 0.602 1.997 0.536 2.131 0.473 2.269 
28 1.104 1,244 1.037 1.325 0.969 1.415 0.900 1.513 0.832 1.618 0.764 1.729 0.696 1.847 0.630 1.970 0.566 2.098 0.504 2.229 
29 1.119 1.254 1.054 1.332 0.988 1.418 0.921 1.512 0.855 1.611 0.788 1.718 0.723 1.830 0.658 1.947 0.595 2.068 0.533 2.193 
30 1.133 1.263 1.070 1.339 1.006 1.421 0.941 1.511 0.877 1.606 0.812 1.707 0.748 1.814 0.684 1.925 0.622 2041 0.562 2.160 
31 1.147 1.273 1.085 1.345 1.023 1.425 0.960 1.510 0.897 1.601 0.834 1.698 0.772 1.800 0.710 1.906 0.649 2.017 0.589 2.131 
32 1.160 1.282 1.100 1.352 1,040 1.428 0.979 1.510 0.917 1.597 0.856 1.690 0.794 1.788 0.734 1.889 0.674 1.995 0.615 2.104 
33 1.172 1.291 1.114 1.358 1.055 1.432 0.996 1.510 0.936 1.594 0.876 1.683 0.816- 1.776 0.757 1.874 0.698 1.975 0.641 2.080 
34 1.184 1.299 1.128 1.364 1.070 1.435 1.012 1.511 0.954 1.591 0.896 1.677 0.837 1.766 0.779 1.860 0.722 1.957 0.665 2.057 
35 1.195 1.307 1.140 1.370 1.085 1.439 1.028 1.512 0.971 1.589 0.914 1.671 0.857 1.757 0.800 1.847 0.744 1940 0.689 2.037 
36 1.206 1.315 1.153 1.376 1.098 1.442 1.043 1.513 0.988 1.588 0.932 1.666 0.877 1.749 0.821 1.836 0.766 1.925 0.711 2.018 
37 1.217 1.323 1.165 1.382 1.112 1.446 1.058 1.514 1.004 1.586 0.950 1.662 0.895 1.742 0.841 1.825 0.787 1.911 0.733 2.001 
38 1.227 1.330 1.176 1.388 1.124 1.449 1.072 1.515 1.019 1.585 0.966 1.658 0.913 1.735 0.860 1.816 0.807 1.899 0.754 1.985 
39 1.237 1.337 1.187 1.393 1.137 1.453 1.085 1.517 1.034 1.584 0.982 1.655 0.930 1.729 0.878 1.807 0.826 1.887 0.774 1.970 
40 1.246 1.344 1.198 1.398 1.148 1.457 1.098 1.518 1.048 1.584 0.997 1.652 0.946 1.724 0.895 1.799 0.844 1.876 0.749 1.956 
45 1.288 1.376 1.245 1.423 1.201 1.474 1.156 1.528 1.111 1.584 1.065 1.643 1.019 1.704 0.974 1.768 0.927 1.834 0.881 1.902 
50 1.324 1.403 1.285 1.446 1.245 1.491 1.205 1.538 1.164 1.587 1.123 1.639 1.081 1.692 1.039 1.748 0.997 1.805 0.955 1.864 
55 1.356 1.427 1.320 1.466 1.284 1.506 1.247 1.548 1.209 1.592 1.172 1.638 1.134 1.685 1.095 1.734 1.057 1.785 1.018 1.837 
60 1.383 1.449 1.350 1.484 1.317 1.520 1.283 1.558 1.249 1.598 1.214 1.639 1.179 1.682 1.144 1.726 1.108% 1.771 1.072 1.817 
65 1.407 1.468 1.377 1.500 1.346 1.534 1.315 1.568 1.283 1.604 1.251 1.642 1.218 1.680 1.186 1.720 1.153 1.761 1.120 1.802 
70 1.429 1.485 1,400 1.515 1.372 1.546 1.343 1.578 1.313 1.611 1.283 1.645 1.253 1.680 1.223 1.716 1.192 1.754 1.162 1.792 
75 1.448 1.501 1.422 1.529 1.395 1.557 1.368 1.587 1.340 1.617 1.313 1.649 1.284 1.682 1.256 1.714 1.227 1.748 1.199 1.783 
80 1.466 1.515 1.441 1.541 1.416 1.568 1.390 1.595 1.364 1.624 1.338 1.653 1.312 1.683 1.285 1.714 1.259 1.745 1.232 1.777 
85 1.482 1.528 1.458 1.553 1.435 1.578 1.411 1.603 1.386 1.630 1.362 1.657 1.337 1.685 1.312 1.714 1.287 1.743 1.262 1.773 
90 1.496 1.540 1.474 1.563 1.452 1.587 1.429 1.611 1.406 1.636 1.383 1.661 1.360 1.687 1.336 1.714 1.312 1.741 1.288 1.769 
95 1.510 1.552 1.489 1.573 1.468 1.596 1.446 1.618 1.425 1.642 1.403 1.666 1.381 1.690 1.358 1.715 1.336 1.741 1.313 1.767 
100 1.522 1.562 1.503 1.583 1.482 1.604 1.462 1.625 1.441 1.647 1.421 1.670 1.400 1.693 1.378 1.717 1.357 1.741 1.335 1.765 
150 1.611 1.637 1.598 1.651 1.584 1.665 1.571 1.679 1.557 1.693 1.543 1.708 1.530 1.722 1.515 1.737 1.501 1.752 1.486 1.767 
200 1.664 1.684 1.653 1.693 1.643 1.704 1.633 1.715 1.623 1.725 1.613 1.735 1.603 1.746 1.592 1.757 1.582 1.768 1.571 1.779 
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n d 


du 


16 0.060 
17 0.084 
18 0.113 
19 0.145 
20 0.178 
21 0.212 
22 0.246 
23 0.281 
24 0.315 
25 0.348 
26 0.381 
27 0.413 
28 0.444 
29 0.474 
30 0.503 
31 0.531 
32 0.558 
33 0.585 
34 0.610 
35 0.634 
36 0.658 
37 0.680 
38 0.702 
329110723 
40 0.744 
45 0.835 
50 0.913 
S5 0.979 
60 1.037 
65 1.087 
70 1.131 
75 1.170 
80 1.205 
85 1.236 
90 1.264 
95 1.290 
100 1.314 
150 1.473 
200 1.561 


3.446 
3.286 
3.146 
3.023 
2.914 
2.817 
2.729 
2.651 
2.580 
2.517 
2.460 
2.409 
2.363 
2.321 
2.283 
2.248 
2.216 
2.187 
2.160 
2.136 
2.113 
2.092 
2.073 
2.055 
2.039 
1.972 
1.925 
1.891 
1.865 
1.845 
1.831 
1.819 
1.810 
1.803 
1.798 
1.793 
1.790 
1.783 
1.791 


0.053 
0.075 
0.102 
0.131 
0.162 
0.194 
0.227 
0.260 
0.292 
0.324 
0.356 
0.387 
0.417 
0.447 
0.475 
0.503 
0.530 
0.556 
0.581 
0.605 
0.628 
0.651 
0.673 
0.694 
0.790 
0.871 
0.940 
1.001 
1.053 
1.099 
1.141 
1.177 
1.210 
1.240 
1.267 
1.292 
1.458 
1.550 


3.506 
3.358 
3.227 
3.109 
3.004 
2.909 
2.822 
2.744 
2.674 
2.610 
2.592 
2.499 
2.451 
2.407 
2.367 
2.330 
2.296 
2.266 
2.237 
2.210 
2.186 
2.164 
2.143 
2.123 
2.044 
1.987 
1.945 
1.914 
1.889 
1.870 
1.856 
1.844 
1.834 
1.827 
1.821 
1.816 
1.799 
1.801 


Note: n = number of observations. 
K = number of explanatory variables excluding the constant term, 


Source: Savin and White, op. cit., by permission of the Econometric Society. 
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Table D.6A Critical Values of Runs in the Runs Test 


N2 
N 2 3 4 56 7m8 9m0 T 12 5R 14 15 16 17 18 19 20 
2 2 2 2 2 2 2 2 2 2 
3 PL 2? 22 2 2 2 2 2 3 3 3 3 3 3 
4 Ey mpy apie Te 3 3 3 3 3 3 4 4 4 4 4 
5 2 2o o |S 3 4 4 4 4 4 4 4 5 5 5 
6 2m a’ a A 4 4 4 5 5 5 5 5 5 6 6 
7 ame? se 3 eS a 4 5 5) 5 5 5 6 6 6 6 6 6 
8 2 3 3 3 4 4 5 5 5 6 6 6 6 6 7 74 7 7 
9 2 Ss 3) 4 4 oe 5 6 6 6 i 7 vi 7 8 8 8 
10 2 @ 3.4 5°55 6 6 7 Z 7 7 8 8 8 8 9 
11 m2 82 4 #4 G 6 7 7 7 8 8 8 9 9 9 9 
2 ae? Bs 4 4 5 6 6 4 7 7 8 8 8 9 9 9 10 10 
BP 92 3 4 5353 6'6 7 7 8 8 9 9 9 10 10 10 TIO 
142 2 åB A 55 5 Gee 7 7 8 8 9 9 9 10 10 #10 11 11 
15 2 3 =æ 4 = 6 6 7 7 8 8 9 9 10 10 11 11 U1 12 
16 T2 Ss @4 4 Sa 6 67 8 8 9 9 10> tO ali 11 11 12 12 
1 e eS A 4S 6 77 8 9 9 alo aml l 11 11 127 7 2 T13 
18 2 oe @4 5 5 46, of & 8 9 O 1) a0 11 I2 12 ea Te 
19 a2 eS a4 5 óa 6 s% S 8 9 10 10 1I 11 T2 2a o eS 
20 = we 4 5 6-6 7 8 9 Cy ea 12 12. 13. asi ue) 4! 


Note: Tables D.6A and D.6B give the critical values of runs n for various values of N, (+ symbol) and N: (— symbol). For the one-sample runs test, any value 
of n that is equal to or smaller than that shown in Table D.6A or equal to or larger than that shown in Table D.6B is significant at the 0.05 level. 


Source: Sidney Siegel, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill Book Company, New York, 1956, table F, pp. 252-253. The tables have been 
adapted by Siegel from the original source: Frieda S. Swed and C. Eisenhart, “Tables for Testing Randomness of Grouping in a Sequence of Alternatives.” Annals of 


Mathematical Statistics, vol. 14, 1943. Used by permission of McGraw-Hill Book Company and Annals of Mathematical Statistics. 


Table D.6B Critical Values of Runs in the Runs Test 


N2 
N, 2 3 4 5 6 7 8 9 10 "11 12" 13. 141s? 16" 17 
2 
3 
4 9 9 
5 9 10 10 11 11 
6 Cy MO M iv R ies BEBI 
i iW 2 13 13 4 14 14 TeS Se lS 
8 11 12 13 14 14 15 15 16 16 l6 l6 I7 V 
9 13 14 14 15 16 16 16 17 17 18 18 18 
10 13 14 15 16 16 #17 17 18 list We “19> as 
11 13 14 15 16 17 17 18 19 19 19 20 20 
12 13 14 16 16 17 18 19 19 208 2 mi Z 
13 15 16 17 18 19 19 20 20 21 $2] 22 
14 15 16 17 18 19 20 20 21 2 R Z 
15 1S 6 Ws We 12% 20 21 2 œ B3 Z 
16 17 18 19 20521 Zi 22 B 23 29 
17 17 18 19 20" 21 22 23 28 PA T5 
18 17 18 19 20 21 22m 23 2A PSs 
19 17 18720 21 2 23T 23 2A ESE 
20 17 18 20 21 22 23 2 2% 25 ve 


18 


19 


17 
18 
20 
21 
22 
25 
23 
24 
25 
26 
26 
27 
27 


20 
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Example 2 


In a sequence of 30 observations consisting of 20 + signs (= N,) and 10 - signs (= N,), the critical values of 
runs at the 0.05 level of significance are 9 and 20, as shown by Tables D.6A and D.6B, respectively. Therefore, 
if in an application it is found that the number of runs is equal to or less than 9 or equal to or greater than 20, 
one can reject (at the 0.05 level of significance) the hypothesis that the observed sequence is random. 


Table D.7 1% and 5% Critical Dickey—Fuller ¢ (= 7) and F Values for Unit Root Tests 


tne” t bet” H E 
Sample ———— -~ — -Á M a — 
Size 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 
25. —2.66 —1.95 —3.75 —3.00 —4,38 —3.60 10.61 7.24 8.21 5.68 
50 —2.62 —1.95 —3.58 —2.93 —4.15 —3.50 9.31 6.73 7.02 5.13 
100 —2.60 —1.95 —3.51 —2.89 —4.04 —3.45 8.73 6.49 6.50 4.88 
250 —2.58 —1.95 —3.46 —2.88 —3.99 —3.43 8.43 6.34 6.22 4.75 
500 —2.58 —1.95 —3.44 —2.87 —3.98 —3.42 8.34 6.30 6.15 4.71 


co 86 + 2.58 —1.95 —3.43 —2.86 —3.96 —3.41 8.27 6.25 6.09 


*Subscmpts ne, c. and ct denote, respectively, that there is no constant, a constant, and a constant and trend term in the regression Eq. (21.9.5). 
tThe critical F values are for the joint hypothesis that the constant and ô terms in Eq. (21.9.5) are simultaneously equal to zero. 
‘The cntical F values are for the joint hypothesis that the constant, trend, and ô terms in Eq. (21.9.5) are simultaneously equal to zero. 


Source: Adapted from W A Fuller, /ntroduction to Statistical Time Series, John Wiley & Sons, New York, 1976, p. 373 (for the T test), and D. A. Dickey and W. A. Fuller, 
“Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root,” Econometrica, vol. 49, 1981, p. 1063. 
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Asymmetry, 498 
Asymptote, 179 
Asymptotic properties, 78 
Asymptotically normal distribution, 534 
A-theoretic models, 835 
Augmented Dickey—Fuller (ADF) test, 
798-800 
Augmented Engle—Granger (AEG) test, 
806-807 
Autocorrelation, 436-474 
ARCH/GARCH models, 471-472 
assumption of no, 72-73 
BLUE estimator in presence of, 445 
defined, 437 
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detecting, in autoregressive models, 
671-673 
detection of, 453-463 
Breusch—Godfrey test, 461—463 
Durbin—Watson d test, 457-461 
graphical method, 453—455 
runs test, 455—457 
and dummy variables, 315 
dummy variables in, 471 
example of, 472-474 
GLS method of correcting for, 
464-470 
and heteroscedasticity, 472 
with heteroscedasticity, 472 
nature of, 437-442 
and Newey—West method, 470 
OLS estimation in presence of, 
443-445 
proofs, 492 
pure, 463-464 
remedial measures for, 463 
and selection of method, 470 
wages-productivity example of, 
451-453 
Autocorrelation coefficients, 796-797 
Autocorrelation function (ACF), 
792-196 
Autoregression, 440 
Autoregressive and moving average 
(ARMA) process, 823 
Autoregressive conditional 
heteroscedasticity (ARCH) 
effect: 
and Durbin—Watson d, 842 
in volatility measurement, 841-842 
Autoregressive conditional] 
heteroscedasticity (ARCH) 
model, 471-472 
of U.S. inflation rate, 844-845 
in volatility measurement, 838 
Autoregressive integrated moving 
average (ARIMA) model, 
821-824 
estimation of, 829 
of yen/doilar exchange rate, 845 
Autoregressive (AR) models, 457 
detecting autocorrelation in, 671-673 
estimation of, 668-670 
examples of, 673-679 
instrumental variables method, 
670-671 
Auxiliary regression, 358 


B 


Balanced panel, 24 
Ballentine view, 78 
Bank money creation, 654-655 
Base category, 299 
Bayesian approach, 9 
Behavioral equations, 727 
Benchmark category, 299 
Berenblutt—Webb test, 467 
Bernoulli probability distribution, 572 
Best linear unbiased estimator (BLUE), 
77 
Best unbiased estimators (BUE), 109, 
249n 
Beta coefficient, 171 
BG test (see Breusch—Godfrey test) 
Bias (See also Unbiasedness) 
errors of measurement, 494 
excluded variable specification, 
439 
in indirect least squares estimators, 
711-1718 
model specification, 492 
pretest, 219n 
self-selection, 523 
simultaneous-equation, 715-717 
specification (see Specification bias) 
Bilateral causality, 687 
Binary response variable, 571 
Binary variable, 570 
Binomial distribution, 572 
Bivariate normal probability density 
function, 113-114 
Bivariate regression (see Two-variable 
regression analysis) 
BJ methodology (see Box—Jenkins 
methodology) 
BLUE (see Best linear unbiased 
estimator) 
Bond rating prediction, 580 
Bootstrapping, 534 
Bottom-up approach, 499 
Box—Cox regression model, 202 
Box—Cox transformation, 562 
Box—Jenkins (BJ) methodology, 
820-821 
diagnostic checking, 829 
estimation of ARIMA model, 829 
forecasting, 830-831 
identification, 825-829 
steps of, 824-825 
Box-Pierce Q statistic, 796-797 
Breusch—Godfrey (BG) test, 462 


Breusch—Pagan (BP) test, 636 

Breusch—Pagan—Godfrey (BPG) test, 
405—406 

BUE (see Best unbiased estimators) 

Bull’s eye, 19n, 783n 


(È 


Capital asset pricing model (CAPM), 
159-160 
Cauchy—Schwarz inequality, 91 
Causal models (see Recursive models) 
Causality: 
in economics, 686-691 
and exogeneity, 691 
and VAR model. 834 
Causality tests, 834 
Causation, regression vs., 19 
C-D production function 
(see Cobb—Douglas production 
function) 
CDF (see Cumulative distribution 
function) 
CEF (conditional expectation function), 
41 
Censored sample, 602 
Central limit theorem (CLT), 106 
CES production (see Constant elasticity 
of substitution production) 
Characteristic line, 440 
Chi-square test, 126 
Chi-square test of significance, 126 
Chow test, 272-275 
Churning, 534 
Classical approach, 9 
Classical linear regression model 
(CLRM): 
assumptions, 67—74 
defined, 13 
examples of, 81-88 
Gauss—Markov theorem, 76-78 
goodness of fit, 78-82 
and Monte Carlo experiments, 88, 89 
precision/standard errors, 74-76 
problems in applying, 338 
Classical normal linear regression model 
(CNLRM), 105-109 
defined, 13 
maximum likelihood method, 109 
normality assumption, 106-109 
probability distribution of 
disturbances, 105-106 


Classical regression analysis, 19n 
Classical theory of statistical inference, 
105 
CLFPR (see Civilian labor force 
participation rate) 
CLRM (see Classical linear regression 
model) 
CLT (see Central limit theorem) 
CNLRM (see Classical normal linear 
regression model) 
C-O iterative method 
(see Cochran—Orcutt iterative 
method) 
Cobb-Douglas (C-D) production 
function, 9 
EViews output of, 247-248 
example of, 221-222 
of Mexican economy, 559 
properties of, 220-221 
Cobweb phenomenon, 440 
Cochran—Orcutt (C—O) iterative method, 
468 
Coefficient of adjustment, 667 
Coefficient of autocorrelation at lag 1, 
443 
Coefficient of autocovariance, 443 
Coefficient of correlation (R), 82 
Coefficient of determination (R°), 78 
allocating, among regressors, 219 
comparing two, 216~219 
and F test, 257-258 
multiple, 210-211 
in multiple regression, 215-220 
testing overall significance in terms 
of, 258-259 
two-variable regression model 
estimation problem, 78-82 
Coefficient of expectation, 664 
Coefficient of partial determination, 227 
Cohen—Rea—Lerman study, 578-580 
Coherency, data, 493 
Cohort analysis, 622 
Coincident regressions, 303 
Cointegrated time series, 805-808 
Cointegrated variables, 805 
Cointegrating parameter, 805 
Cointegrating regression, 805 
Cointegration, testing for, 805-807 
Collinearity, 204, 340n 
(See also Multicollinearity) 
Common logarithms, 199 
Commutative property, 842 
Comparison category, 299 


Compatibility, 120 
Composite hypothesis, 120 
Compound rate of growth, 176 
Computers, 10 
Concurrent regressions, 303 
Condition index, 358 
Conditional expectation function (CEF), 
41 
Conditional expected value, 39 
Confidence band, 135 
Confidence coefficient, 116 
Confidence interval(s), 135 
for B, and B, simultaneously, 119 
for B,, 117-119 
defined, 116 
and multicollinearity, 349 
for 07, 119-120 
Confidence-interval hypothesis testing, 
121-122 
Confidence limits, 116 
Confidentiality, 27 
Consistency, 105, 107 
Constancy, parameter, 493 
Constant coefficients model 
(see Pooled OLS regression 
model) 
Constant elasticity model, 173 
Constant elasticity of substitution (CES) 
production, 9 
Constant variance of u; (assumption 4), 
70-71 
Consumption function, 3 
Control category, 299 
Control purposes, model used for, 8 
Control variables, 8 
Corporate profits (CP), 781, 782 
Correlation(s): 
assumption of no serial, 72—73 
auto- (see Autocorrelation) 
pair-wise, 356 
partial, 357 
regression vs., 20 
Correlation analysis, 20 
Correlation coefficient(s), 20 
of zero order, 226 
Correlation matrix, 366 
Correlogram, 792-796 
Cost analysis theory, 160, 161 
Count data, 571 
Count data modeling, 604—607 
Count R°, 591 
Count type, 604 
Covariance, 101 
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Covariance stationary, 784 

Covariates, 302 

CP (see Corporate profits) 

CPI (see Consumer Price Index) 

CPS (Current Population Survey), 524 

Critical x? values, 120 

Critical Dickey—Fuller t and F values for 
unit root tests, 893 

Critical level, 594 

Critical regions, 123 

Critical t values, 122 

Critical values, 116 

Cross-section studies, 436 

Cross-sectional data, 21 

Cumulative distribution function (CDF), 
551, 594 


Daily data, 21-22 
Data: 
coherency of, 493 
manipulation of, 441 
observational vs, experimental, 2 
obtaining, 4—6 
unavailability of, 45 
Data for economic analysis, 21-27 
accuracy of, 27 
cross-section, 22—24 
panel/longitudinal/micropanel, 25, 26 
sources of, 25 
time-series, 21—22 
types of, 21 
Data generating process (DGP), 781 
Data grubbing, 499 
Data mining, 499-500 
Data snooping, 499 
Data transformation, 441 
Davidson—MacKinnon J test, 513-515 
Debit cards, 580, 593-594 
Decennial data, 21 
Degrees of freedom (df), 75 
Demand-and-supply model, 710-711 
Denominator degrees of freedom, 155 
Dependent variable, 3, 13 
Derivative-free method, 557 
Deseasonalization, 307 
Deterministic component, 44 
Deterministic relationship, 4, 19 
Deterministic time series, 788 
Deterministic trend, 788 
Deterministic trend with stationary 
AR(1) component, 789 
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Detrended time series, 788 
Detrending, 788 
Deviation form, 66 
Df (degrees of freedom), 75 
DF test (see Dickey—Fuller test) 
DGP (data generating process), 781 
Diagnostic checking, 829 
Dichotomous dependent variable 
models, 315 
Dichotomous variable, 570 
Dickey—Fuller (DF) test, 798-800 
Dickey—Pantula test, 802 
Difference form, 441 
Difference stationary process (DSP), 
787 
Difference stationary (DS) stochastic 
processes, 787-789 
Differential intercept coefficients, 299 
Differential intercept dummy 
technique, 627 
Differential slope coefficients, 305 
Differential slope dummy 
coefficients, 629 
Direct optimization, 556 
Direct search method, 556 
Discerning approach, to non-nested 
hypotheses tests, 512-516 
Discrimination approach, to non-nested 
hypotheses tests, 512 
Disequilibrium models, 316 
Disposable personal income (DPI), 
781, 782 
Dissimilar regressions, 303 
Distributed-lag models, 512 
Distributed-lag multiplier, 654 
Disturbance term, 4 
Disturbances: 
assumption of no autocorrelation 
between, 72-73 
heteroscedastic variances of, 573-574 
non-normality of, 573 
probability distribution of, 105—106 
Dividends, 781, 782 
“Doing nothing,” 360 
Double-log model, 172 
Downward trend, 177 
DPI (see Disposable personal income) 
Drift parameter, 786 
DS stochastic processes (see Difference 
stationary stochastic processes) 
DSP (see Difference stationary process) 
Dummy variables: 
in ANCOVA models, 302 


in ANOVA models, 296-301 
and autocorrelation, 315 
Chow-test alternative, 303-305 
defined, 296 
as dependent variables, 315 
example of, 316-320 
guidelines for using, 299-300 
and heteroscedasticity, 315 
interaction effects using, 306-307 
nature of, 295-296 
in panel data models, 314 
in piecewise linear regression, 
311-313 
for seasonal analysis, 307-311 
semilogarithmic regressions, 
314 
topics for study, 316 
Dummy variables method, 307, 310n 
Dummy-vanable trap, 299 
Duration models, 608 
Durbin A test, 671-673 
Durbin’s h statistic, 491 
Durbin’s M test, 462 
Durbin—Watson d statistic, 457 
and ARCH effect, 843 
p based on, 467 
Durbin—Watson d test, 457-461 
Dynamic regression models, 652 


E 


ECM (see Error correction mechanism) 
Econometric model(s): 

applications of, 8 

of consumption, 4 

estimation of, 5, 7 

example of, 4 

Klein’s, 715 

selection of, 8—9 

Chow’s prediction failure test in, 

522-523 

examples of, 524-532 

guidelines for, 535 

measurement errors, 506—510 
in dependent variable Y, 506-507 
example, 509-510 
in explanatory variable X, 507-508 

model selection criteria, 493 
adjusted R?, 517 
Akaike’s information criterion, 

517-518 

caution about criteria, 519-520 
forecast chi-square, 519-520 


Mallows’s C, criterion, 518-519 
R? criterion, 517 
Schwarz’s information criterion, 
518 
nested vs. non-nested models, 
510-511 
non-normal error distribution in, 
533-534 
outliers/leverage/influence in, 
520-521 
recursive least squares in, 521-522 
specification errors 
consequences of, 495-498 
tests of, 499-506 
types of, 493-495 
stochastic error term specification, 510 
stochastic explanatory variables in, 
534-535 
tests of non-nested hypotheses, 
511-516 
Davidson—MacKinnon J test, 
513-516 
discerning approach, 512-516 
discrimination approach, 512 
non-nested F test, 511-513 
tests of specification errors, 499-506 
and unbiasedness property, 546-547 
Econometrics: 
computer’s role in, 10 
definitions, 1 
as empirical verification of economic 
theory, 2 
mathematical prerequisites, 10 
methodology of, 2-9 
data gathering, 4—6 
econometric model specification, 
3-4 {v 
forecasting, 7 
hypothesis statement, 3 
hypothesis testing, 6-7 
mathematical model specification, 3 
model applications, 8 
model estimation, 5 
reading resources about, 10 
statistical prerequisites, 10 
types of, 9 
Economic forecasting, 820-822 
Economic theory, 2 
Economics, causality in, 686-691 
Efficient capital market hypothesis, 785 
Efficient estimators, 77 
Eigenvalues, 358 
Elasticity of demand, 18 


Encompassing F test, 512-513 
Encompassing model, 493 
Encompassing principle, 513 
Endogenous variables, 691 
Endpoint restrictions, 686 
Engel expenditure models, 178 
Engle—Granger (EG) test, 806-807 
Equality testing, of two regression 
coefficients, 262-264 
Equation error term, 507 
Error components model (see Random 
effects model) 
Error correction mechanism (ECM), 
807-808 
Error sum of squares, 556n 
Error term, 4 
Error-learning models, 387 
Errors of measurement, 27 
Errors of measurement bias, 494 
ESS (see Explained sum of squares) 
Estimable function, 344n 
Estimate, 48 
Estimated generalized least squares 
(EGLS), 469 
Estimated value, 5n 
of ARIMA model, 829 
in classical theory of statistical 
inference, 105 
of econometric model, 5 
maximum likelihood method, 
109-114 
simultaneous-equation methods, 
751-752 
bias in indirect least-squares 
estimators, 777-778 
examples, 764-769 
indirect least squares, 755-758 
recursive models and OLS, 
752-755 
standard errors of 2SLS 
estimators, 779 
two-stage least squares, 758-764 
in VAR model, 832-834 
Estimators, 47 
Event history analysis, 622 
Exact (just) identification, 731-734 
Exact level of significance (p value), 
129-136 
Exact micronumerosity, 345 
Exact relationship, 4 
Exogeneity, 691 
Exogeneity tests, 743 
Exogenous variables, 709n 


Expectations-augmented Phillips curve, 
182 

Expected value, 38n 

Experimental data, 2, 25 

Explained sum of squares (ESS), 79 

Explanatory variable, 3, 13 

Exponential distribution, 114 

Exponential functions, 199 

Exponential GARCH (EGARCH), 847 

Exponential regression model, 172 

Exponential smoothing methods, 821 

Extrapolation, 441 


F test: 
adding group of variables to, 262 
adding new variable to, 262 
of linear equality restrictions, 265-269 
of overall significance testing, 
256-257 
unit root tests of time series data, 801 
Factor analysis, 364 
Feasible GLS (FGLS) method, 469, 470 
Federal Reserve Bank of St. Louis, 781 
FEM (see Fixed effects model) 
FGLS (see Feasible GLS method) 
FIML (full information maximum 
likelihood) method, 752 
Finite (lag) distributed-lag model, 658 
Finite sample properties, 78 
First difference form, 363 
First difference operator, 441 
First-difference equation, 466 
First-difference method, 466-468 
First-order autoregressive (AR(1)), 443 
First-order coefficient of autocorrelation, 
443 
First-order correlation coefficients, 227 
First-order moving average (MA(1)), 
823 
Fixed effect estimators, 629 
Fixed effect LSDV model, 627-630 
Fixed effect WG estimator, 630-633 
Fixed effects model (FEM), 630 
Fixed regressors, 68 
Fixed values (assumption 2), 68 
Flexible accelerator model, 666 
Forecast chi-square, 519-520 
Forecast error, 7 
Forecast variable, 7 
Forecasting: 
ARIMA, 821-822 , 
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in BJ methodology, 830-831 
as econometric modeling step, 7 
economic, 820-822 
exponential-smoothing, 821 
simultaneous-equation-regression, 831 
single-equation-regression, 821 
VAR, 822 
FRED database, 781 
Friedman’s permanent income 
hypothesis, 160 
Frisch—Waugh theorem, 311 
Full information maximum likelihood 
(FIML) method, 752 
Full information methods, 751 
Functional form: 
tests for incorrect, 501-506 
wrong, 494 


G 


G statistic, 467 

“Game” of maximizing adjusted 
coefficient of determination, 
219-220 

GARCH model (see Generalized 
autoregressive conditional 
heteroscedasticity model) 

GARCH-M (GARCH in mean) model, 
846 

Gaussian linear regression model 
(see Classical linear regression 
model) 

Gaussian white noise process, 784 

Gauss—Markov theorem, 76-77 

Gauss—Newton iterative method, 557 

GDP (see Gross domestic product) 

Geary test (see Runs test) 

General F testing, 268-269 

Generalized autoregressive conditional 
heteroscedasticity (GARCH) 
model, 471-472 

Generalized (quasi) difference equation, 
465 

Generalized least squares (GLS), 
392-395 

Geriatric falls, 606—607 

German Socio-Economic Panel 
(GESOEP), 623 

Glejser test, 400-401 

Glogit model (see Grouped logit model) 

GLS (see Generalized least squares) 

GLS estimators, 393 

GNP (gross national product), 2 

Goldfeld—Quandt test, 403-405 
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Goodness of fit, 78-82 

Goods market equilibrium schedule, 713 

Gprobit model (see Grouped probit 
model) 

Granger causality test, 687-691 

Granger representation theorem, 807 

Graphical analysis, 792 

Gravity, law of, 19 

Gross domestic product (GDP), 4-6 

Gross national product (GNP), 2 

Grouped data, 585, 589 

Grouped logit (glogit) model, 587-589 

Grouped probit (gprobit) model, 
596-598 

Growth rate, instantaneous vs./ 
compound, 176 

Growth rate formulas, 200-201 

Growth rate measurement, 175—177 


H 


H statistic, 491 
HAC standard errors 
(see Heteroscedasticity- and 
autocorrelation-consistent 
standard errors) 
Hat (^), 5n 
Hausman test, 634, 719 
Hazard rate, 603 
Heterogeneity, 626 
Heterogeneity effect, 626 
Heterogeneity problem, 23 
Heteroscedastic variances, 573-574 
Heteroscedasticity, 386—421 
and autocorrelation, 472 
defined, 70 
detection of, 397-410 
Breusch—Pagan-Godfrey test, 
405-406 
formal methods, 399-400 
Glejser test, 400-401 
Goldfeld—Quandt test, 403-405 
graphical method, 398-399 
informal methods, 397-399 
Koenker—Basset test, 409-410 
nature of problem, 397-398 
Park test, 399-400 
selection of test, 409 
Spearman’s rank correlation test, 
401—403 
White’s general test, 407—408 
and dummy variables, 315 
examples of, 416—420 


GLS method of correcting for, 
392-395 
nature of, 386-390 
OLS estimation in presence of, 
391-392 
overreacting to, 420—421 
patterns of, 412—416 
remedial measures for, 410—416 
assumptions about pattern of 
heteroscedasticity, 412—416 
White’s heteroscedasticity- 
consistent variances/standard 
errors, 412 
WLS, 410—411 
White’s standard errors corrected for, 
434-435 
Heteroscedasticity- and autocorrelation- 
consistent (HAC) standard errors, 
470 
Heteroscedasticity-consistent covariance 
matrix estimators, 412n 
Histogram of residuals, 137—138 
Historical regression, 133 
Holt’s linear method, 821 
Holt—Winters’ method, 821 
Homoscedasticity (assumption 4), 
70-71 
Hypothesis statement, 3 
Hypothesis testing, 121-131 
accepting or rejecting hypothesis, 127 
choosing approach to, 131 
choosing level of significance, 129 
in classical theory of statistical 
inference, 105 
confidence-interval approach to, 
121-122 
as econometric modeling step, 6-7 
exact level of significance, 129-130 
forming null/alternative hypotheses, 
128 
in multiple regression, 253-254 
Statistical vs. practical significance, 
130-131 
test-of-significance approach, 
122-126 
zero null hypothesis and 2-1 rule of 
thumb, 127-128 


i (subscript), 21 
Identification: 
in BJ methodology, 825-829 


order condition, 736-737 
rank condition, 738-740 
rules for, 736-737 
Identification problem, 707—708 
defined, 729 
exact identification, 731-734 
notations/definitions used in, 726-729 
overidentification, 734—735 
underidentification, 729-731 
Idiosyncratic term, 634 
ILS (see Indirect least squares) 
Impact multipliers, 653, 728 
Impulse response function (IRF), 836 
Impulses, 832 
Imputing values, 523 
Inclusion, of irrelevant variables, 494, 
498 
Income multiplier (M), 7 
Incremental contribution of explanatory 
variable, 259-262 
Independent variable, 3 
Indifference curves, 28 
Indirect least squares (ILS), 728 
Individual prediction, 135, 136 
Individual regression coefficients, 
251-253 
Individual-level data, 584 
Inertia, 439 
Infinite (lag) model, 658 
Influential point, 520 
Innovations, 832 
Instantaneous rate of growth, 176 
Institutions, 657 
Instrument validity, 705-706 
Instrumental variables, 508 
Instrumental variables (IV) method, 
670-671 
Integrated of order 1, 789 
Integrated of order 2, 789 
Integrated of order d, 790 
Integrated processes, 789-790 
Integrated stochastic processes, 789-790 
Integrated time series, 790 
Interaction among regressors, 495 
Interaction term, 278 
Interactive form, 305 
Intercept, 3 
Intercept coefficient, 41 
Intercorrelation, measurement of, 340n 
Interest rates: 
and Federal Reserve, 676-677 
and money, 689-690 
Internal Revenue Service (IRS), 27 


Internet. 25 
Interpolation, 441 
Interval estimation, 115-120 
confidence interval for a, 119-120 
confidence intervals for regression 
coefficients B, and B,, 117-119 
defined, 116 
Interval estimators, 65, 116 
Interval scale, 28 
Intrinsically nonlinear regression 
models, 553-554 
Inverse Mills ratio, 603 
Investment data, 25, 26 
IRF (impulse response function), 836 
Irrelevant variables: 
inclusion of, 494, 498 
tests for, 499-500 
and unbiasedness property, 546-547 
IRS (Internal Revenue Service), 27 
IS model of macroeconomics, 713-714 
Iterative linearization method, 557 
Iterative methods, 468-469 


J 


J curve of international economics, 656 
J test, 513-515 

Jarque—Bera (JB) test, 137 

Joint confidence interval, 119 


K 


KB test (see Koenker—Basset test) 
Keynesian consumption function, 3—4 
Keynesian model of income 
determination, 711-712 
KISS principle, 535 
Klein’s model I, 715 
Klien’s rule of thumb, 358 
Knot (known in advance threshold), 312 
Koenker—Basset (KB) test, 409-410 
Koyck model, 659-664 
and adaptive expectations model, 
664-666 
combining adaptive expectations 
and partial adjustment models, 
668-669 
example using, 662-664 
mean lag in, 662 
median lag in, 661-662 
and partial adjustment model, 
666-668 
Koyck transformation, 661 
Kruskal’s theorem, 397n, 446 


Kurtosis, 138 


L 


Labor economics, 18 
Labor force participation (LFP), 570, 
578-580 

Lag(s): 
and autocorrelation, 440-441 
in economics, 653-657 
length of, 796 
reasons for, 657—658 

Lag operator, 787n 

Lagged endogenous variables, 727 

Lagged values, 441 

Lagrange multiplier (LM) model, 714 

Lagrange multiplier (LM) test, 275-276, 

505-506 
(See also Breusch—Godfrey test) 

Lag-weighted average of time, 662 

Large sample theory, 534 

Large-sample properties, 104 

Latent variable, 594 

Law of gravity, 19 

Law of universal regression, 15 

LB (Ljung—Box) statistic, 797 

Least-squares criterion, 62 

Least-squares dummy variable (LSDV) 

model, 627-630 

Least-squares estimates: 
derivation of, 99 
precision/standard errors of, 74-76 
two-stage (see Two-stage least 

squares) 

Least-squares estimator(s), 65 
consistency of, 104 
linearity/unbiasedness of, 100 
minimum variance of, 103—104 
ordinary (see Ordinary least squares) 
properties of, 76-78 
for regression through the origin, 

197-198 
of a, 101-102 
variances/standard errors of, 101 

Level form, 441 

Level of significance, 116 
choosing, 129 
exact, 129-130 
in presence of data mining, 499-500 

Leverage, 520, 521 

LF (see Likelihood function) 

LFP (See Labor force participation) 

LGDP time series, 794 
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Life-cycle permanent income hypothesis, 
8 
Likelihood function (LF), 111 
Likelihood ratio (LR) statistic, 591 
Likelihood ratio (LR) test, 275-276 
Limited dependent variable regression 
models, 602 
Limited information methods, 751 
Linear equality restrictions testing, 
264-269 
F-test approach, 265-269 
t-test approach, 264—265 
Linear function, 42n 
Linear in parameter (assumption 1), 68 
Linear population regression function, 
41 
Linear PRF, 41 
Linear probability model (LPM), 
572-577 
alternatives to, 581-582 
applications of, 578-581 
defined, 572 
effect of unit change on regressor 
value in, 599 
example, 575-577 
goodness of fit, 574-575 
heteroscedastic variances of 
disturbances, 573—574 
nonfulfillment of E between 0 and 
1, 574 
non-normality of disturbances, 573 
Linear regression model(s), 44-45 
estimation of, 555 
example of, 4 
log-linear vs., 276-277 
nonlinear vs., 553-554 
Linear trend model, 177 
Linearity, 42—43 
of BLUE, 77 
of least-squares estimators, 100 
in parameters, 42—43 
in variables, 42 
Linearization method, 567-568 
Lin-log model, 175 
Ljung-Box (LB) statistic, 797 
LLF (See Log-likelihood function) 
LM (Lagrange multiplier) model, 714 
LM test (see Lagrange multiplier test) 
Log hyperbola model, 184 
Logarithmic reciprocal model, 184 
Logarithms, 199-200 
Logistic distribution function, 554 
Logistic growth model, 559 
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Logit model, 560-562 
effect of unit change on regressor 
value in, 599 
estimation of, 584—586 
grouped, 587-589 
ML estimation, 620-621 
multinomial, 608 
ordinal, 608 
probit vs., 599-601 
ungrouped data, 590-594 
Log-likelihood function (LLF), 621 
Log-lin model, 175-177 
Log-linear model, 172-175 
Log-log model, 172 
Log-normal distribution, 186 
Long panel, 624 
Longitudinal data (see Panel data) 
Longley data, 365-368 
Long-run multiplier, 654 
Lower confidence limit, 117 
LPM (see Linear probability model) 
LR (likelihood ratio) statistic, 591 
LR test (see Likelihood ratio test) 
LSDV model (see Least-squares dummy 
variable model) 
Lucas technique, 821 
Lurking variables, 630 


M 


MA (see Moving average) 
Maintained hypothesis, 120 
Mallows’s C, criterion, 511 
Manipulation of data, 441 
Manufacturing wages and exports, 53 
Marginal contribution of explanatory 
variable, 259-262 
Marginal propensity to consume (MPC), 
3,6 
Marginal propensity to save (MPS), 271 
Market Model of portfolio theory, 160, 
161 
Markov first-order autoregressive 
scheme, 443 
Marquard method, 557n 
Mathematical economics, 2 
Mathematical model of consumption, 
34 
Maximum likelihood (ML), 246 
example of, 43 
method of, 109 
of two-variable regression model, 
110-113 


Mean prediction, 134—135 
Mean reversion, 782 
Mean value, 39n 
Measurement, errors of, 27 
Measurement scales, 27—28 
Mexican economy, 559 
Micronumerosity, 345 
Micropanel data (see Panel data) 
Minimum variance, 103—104 
Minimum-variance unbiased estimators, 
107 
Missing data, 523 
ML (see Maximum likelihood) 
ML estimators, 210 
Mode! (term), 3 
Model mis-specification errors, 495 
Model selection criteria, 493 
adjusted R?, 517 
Akaike’s information criterion, 
517-518 
caution about criteria, 519-520 
forecast chi-square, 519-520 
Mallows’s C, criterion, 518-519 
R? criterion, 517 
Schwarz’s information criterion. 518 
Model specification bias, 492 
Model specification errors, 492 
consequences of, 495-498 
tests of, 499-506 
Durbin—Watson d statistic, 501-503 
Lagrange multiplier test for adding 
variables, 505-506 
nominal vs. true level of 
significance, 500 
omitted variables detection, 
501-506 
Ramsey’s RESET test, 503-505 
residuals examination, 501 
unnecessary variables detection, 
499-500 
types of, 493—495 
Modified d test, 460 
Modified Phillips curve, 182 
MOM (see Method of moments) 
Moment, 91 
Monetary economics, 17, 18 
Money market equilibrium, 714 
Money supply function, 758 
Monte Carlo experiments, 88-89 
Monthly data, 22 
Moving average (MA), 461, 462 
MPC (see Marginal propensity to 
consume) 


MPS (marginal propensity to save), 271 
MSE estimator 
(see Mean-square-error 
estimator) 
Multicollinearity, 340-369 
assumption of no, 204 
defined, 340 
detection of, 356-360 
effects of, 365 
example, 351-356 
factors in, 342 
high but imperfect, 344 
Longley data example, 365-368 
nature of, 340-342 
perfect, 342-344 
practical consequences of, 346-351 
confidence intervals, 349 
micronumerosity, 351 
OLS-estimator variance, 346-349 
sensitivity to small changes in data, 
350-351 
t ratios, 349 
remedial measures, 360-364 
doing nothing, 360 
rule-of-thumb procedures, 360-366 
theoretical consequences of, 344-346 
Multinomial models, 608 
Multiple coefficient of correlation, 212 
Multiple coefficient of determination, 
210-211 
Multiple regression: 
estimation problem, 203-228 
hypothesis testing 
about individual regression 
coefficients, 251-253 
forms of, 250-251 
with LR/W/LN tests, 275-276 
inference problem, 249-278 
linear equality restrictions testing, 
264-269 
F-test approach, 264-269 
t-test approach, 265 
linear vs. log-linear models, 276-277 
maximum likelihood estimation, 246 
normality assumption, 249-250 
overall significance testing, 253-262 
ANOVA, 254-256 
F test, 254-256 
incremental contribution of 
explanatory variable, 259-262 
R? and F relationship, 257-258 
in terms of R*, 258-259 
partial correlation coefficients, 
226-228 


polynomial regression models, 
223-226 
prediction with, 275 
specification bias in, 214-215 
structural/parameter stability testing, 
270-275 
testing equality of two regression 
coefficients, 262-264 
three-variable model 
adjusted R?, 215-220 
Cobb-Douglas production function, 
220-222 
estimation of partial regression 
coefficients, 207-210 
example, 212-213 
interpretation of regression 
equation, 205 
multiple coefficient of correlation, 
22 
multiple coefficient of 
determination, 210-211 
notation/assumptions, 203-205 
partial regression coefficients, 205— 
207 
standardized variables, regression 
on, 213 
Multiple regression analysis, 20 
Multiple regression model, 14 
Multiple-equation model, 3 
Multiplicative effect, 494 
Multiplicative form, 305 
Mutual fund advisory feeds, 558-559 
MWD test, 276-277 


N 


N (number of observations), 21 

Natural logarithms, 199, 200 

Nature of X variables (assumption 7), 
73-74 

N.e.d. (normal equivalent deviate), 597 

Negative correlation, 71 

Neo-classical linear regression model 
(NLRM), 68 

Nested models, 510 

Newey—West method, 470 

Newton-Raphson iterative method, 557 

Newton’s law of gravity, 19 

NID (normally and independently 
distributed), 106 

NLLS (nonlinear least squares), 555 

NLRM (see Nonlinear regression 
models) 


NLRM (neo-classical linear regression 
model), 68 
No autocorrelation between disturbances 
(assumption 5), 72-74 
Nominal level of significance, 500 
Nominal regressand, 571 
Nominal scale, 28 
Nonexperimental data, 25, 27 
Nonlinear least squares (NLLS), 555 
Nonlinear regression models (NLRM), 
42, 43 
direct optimization, 557 
direct search method, 557 
estimation of, 555 
examples, 558-561 
iterative linearization method, 557 
linear vs., 553-554 
trial-and-error method, 555-556 
Non-nested F test, 511-513 
Non-nested hypotheses tests, 511-516 
Davidson—MacKinnon J test, 513-516 
discerning approach, 512-516 
discrimination approach, 511 
non-nested F test, 511-513 
Non-nested models, 510-511 
Non-normal error distribution, 533-534 
Non-normality, of disturbances, 573 
Nonparametric statistical methods, 801 
Nonparametric tests, 455n 
Nonresponse, 27 
Nonsense regression, 780 
Nonstationary stochastic 
784-787 
Nonstationary time series, 784 
Nonsystematic component, 44 
Normal distribution, 154-156 
Normal equations, 64 
Normal equivalent deviate (N.e.d.), 597 
Normal probability plot (NPP), 137, 138 
Normality (assumption 10), 249-250 
for disturbances, 106 
properties of OLS estimators under, 
107-109 
reasons for using, 107 
of stochastic distribution, 335 
Normality tests, 137-139 
histogram of residuals, 137 
Jarque—Bera test, 138 
normal probability plot, 137, 139 
Normally and independently distributed 
(NID), 106 
Normit, 597 


processes, 
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Normit model (see Probit model) 

Not statistically significant, 122 

NPP (see Normal probability plot) 
Nuisance parameters, 627 

Nuisance variables, 629 

Null hypothesis, 120, 127 

Number crunching, 499 

Numerator degrees of freedom, 155 
Numerical properties, of estimators, 65 
NYSE price changes example, 841-842 


O 


Observational data: 
assumption about, 73 
experimental vs., 2 
quantity of, 73 
Occam’s razor, 46 
Odds ratio, 583 
Ohm’s law, 19 
OLS (see Ordinary least squares) 
and autocorrelation, 443-451 
and heteroscedasticity, 391-392 
OLS estimators, 207—210 
derivation of, 243—244 
inconsistency of, 715-718 
multicollinearity and variance of, 
347-349 
properties, 107-109 
properties of, 209-210 
sensitivity of, 350-351 
variances and standard errors of, 
208-209 
OLS standard-error correction, 476 
Omission, of relevant variable, 494 
Omitted category, 299 
Omitted variables, 501-506 
One-sided hypothesis, 122 
One-tail hypothesis test, 122 
One-tail test of significance, 125 
One-way fixed effects, 629 
Order condition of identifiability, 
736-737 
Ordinal models, 608 
Ordinal regressand, 571 
Ordinal scale, 28 
Ordinary least squares (OLS), 61-96 
(See also OLS estimation; OLS 
estimators) 
assumptions, 67-74 
examples of, 84-89 
Gauss—Markov theorem, 76-78 
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GLS vs., 394-395 
goodness of fit, 78-82 
method of, 61-67 
and Monte Carlo experiments, 88-89 
precision/standard errors, 74-76 
and recursive models, 753-755 
Orthogonal polynomials, 364 
Outliers, 388 
Overall significance testing: 
ANOVA, 254-256 
F test, 256-257 
incremental contribution of 
explanatory variable, 259-262 
individual vs. joint, 257 
in multiple regression, 253—262 
R? and F relationship, 257-258 
in terms of R?, 258-259 
Overdifferencing, 804 
Overfitting, of model, 498 
Overidentification, 734-735 
Overidentified equation, 758-761 


P 


Pair-wise correlations, 356-357 
PAM (see Partial adjustment model) 
Panel data, 25, 26 
Panel data models, 622—643 
advantages of, 623—624 
dummy variables in, 314 
estimators, properties of, 637 
examples of, 624—625 
fixed effect LSDV model, 627—630 
fixed effect within-group estimator, 
630—633 
pooled OLS regression model, 
625—627 
random effects model, 633—636 
selection guidelines, 637—638 
Panel Study of Income Dynamics 
(PSID), 622 
Panel-corrected standard errors, 637 
Parallel regressions, 303 
Parameter constancy, 493 
Parameters, 3 
Park test, 399-400 
Parsimony, 46 
Partial adjustment model (PAM), 
666—668 
Partial correlation coefficients, 226-228 
Partial correlations, 357 
Partial regression coefficients, 204 
PCE (see Personal consumption 
expenditure) 


PDF (see Probability density function) 
PDL (see Polynomial distributed lag) 
Percent growth rate, 172n 
Percentage change, 172n 
Percentages, logarithms and, 200 
Perfect collinearity, 299 
Perfect multicollinearity, 342-344 
Permanent consumption, 45 
Permanent income hypothesis, 8—9 
Personal computers, 87-88 
Personal consumption expenditure 
(PCE), 5, 781 
Phenomenon of spurious regression, 
790-791 
Phillips curve, 18 
Phillips—Perron (PP) unit root tests, 801 
Piecewise linear regression, 311—313 
Pindyck—Rubinfeld model of public 
spending, 742 
Plim (probability limit), 717 
Point estimation, 115 
Point estimators, 4, 65 
Poisson process, 571 
Poisson regression model, 604—607 
Policy purposes, model used for, 8 
Polychotomous variable, 571 
Polynomial distributed lag (PDL), 
679-686 
Polynomial regression, 223—226 
Polytomous dependent variable, 315 
Pooled data, 622 
Pooled estimators, 637 
Pooled OLS regression model, 625, 627 
Pooled regression, 271 
Population, 38 
Population correlogram, 792 
Population growth, 559-560 
Population regression (PR), 41 
Population regression curve, 40 
Population regression function 
(PRF), 41-45 
Population regression line (PRL), 40, 41 
Population transformation, 562 
Power: 
of statistical test, 463n 
of the test, 129, 404n, 
of unit root tests, 801 
PP (Phillips—Perron) unit root tests, 801 
PR (population regression), 41 
Practical significance, statistical vs., 
130-131 
Prais—Winsten transformation, 465 
Precedence, 687 


Precision, 74-76 
Predetermined variables, 727 
Prediction (See also Forecasting) 
individual, 135-136 
mean, 134-135 
with multiple regression, 275 
Predictive causality, 687 
Predictor variable, 8 
Pretest bias, 219n 
Pretesting, 500 
PRF (see Population regression function) 
Price elasticity, 17 
Principal components technique, 364 
PRL (see Population regression line) 
Probability distribution(s), 107, 108 
of disturbances, 105-106 
normal distribution related to, 
154-156 
Probability limit (plim), 717 
Probability of committing Type I error, 
116n, 129 
Probit model, 594-599 
effect of unit change on regressor 
value in, 599 
with grouped data, 596-598 
logit vs., 599-601 
ML estimation, 620-621 
multinomial, 608 
ordinal, 608 
with ungrouped data, 598-599 
Productivity, 639-640 
Proxy variables, 45 
PSID (Panel Study of Income 
Dynamics), 622 
Psychology, 657 
Pth-order autoregressive (AR(p)), 823 
Pure autocorrelation, 463-464 
Pure random walk, 788 
Purely random process, 784 


Q 


Q statistic, 796-797 

Qth-order moving average (MA(q)), 823 
Quadratic function, 223 

Qualitative response models, 570-608 
duration models, 608 

linear probability model, 572-582 
logit model, 582-594 

multinomial models, 608 

nature of, 570-572 

ordinal models, 608 

Poisson regression model, 604-607 


probit model, 594-599 
selection of model, 599-601 
tobit model, 602-606 
unit change in value of regressor in. 
599 
Qualitative variables, 14 
Quality, of data, 27 
Quarterly data, 22 
Quasi-difference equation, 465 
Quinquennial data, 22 


R 


R? criterion, 517 

Ramsey’s RESET test, 503-505 

Random (term), 20 

Random effects estimators. 637 

Random effects model (REM), 633-638 

Random interval, 116 

Random regressor case, 534, 535 

Random (stochastic) variable, 4, 19 

Random walk model (RWM). 784—789 

Random walk phenomenon, 780 

Random walk time series, 793 

Randomness, 45 

Rank condition of identifiability, 
738-740 

Rare event data, 571 

Ratio scale, 28 

Ratio transformation, 363 

Rational expectations (RE) hypothesis, 
666 

Raw r’, 162 

RE (rational expectations) hypothesis, 
666 

Real consumption function, 529-532 

Realization of possibilities, 784 

Real-time quote, 22 

Reciprocal models, 179-184 

Recursive least squares (RELS), 522 

Recursive models, 753-755 

Recursive residual test, 275 

Recursive residuals, 522 

Reduced-form coefficients, 727, 728 

Reduced-form equations, 727, 728 

Reference category, 299 

Region of acceptance, 123 

Region of rejection, 123 

Regressand, 21 

Regression: 

historical origin of term, 15 
through the origin, 159-166 


on standardized variables, 170-171 
Regression analysis, 15-21 
and analysis of variance, 131—133 
and causation, 19-20 
and correlation, 20 
data for, 21-28 
defined, 15 
for estimation, 5 
evaluating results of, 137-140 
examples of, 16-18 
measurement scales of variables, 
27-28 
prediction problem, 133—136 
reporting results of, 136 
statistical vs. deterministic 
relationships in, 19 
terminology/notation used in, 20 
Regression coefficients, 41 
Regression fishing, 499 
Regression line, 16 
Regression model(s), 171-172 
Box—Cox, 202 
elasticity measurement, 172-175 
growth measurement, 175-179 
log-linear model, 172-174 
reciprocal modelis, 179-186 
semilog models, 175—179 
and stochastic error, 186-187 
Regression software, 10 
“Regression to mediocrity,” 15 
Regressor, 20 
Rejecting hypothesis, 127 
Relative (proportional) change, 172n 
Relative frequency, 585 
Relevant variable, omission of, 494, 
495-497 
RELS (recursive least squares), 522 
REM (see Random effects model) 
Repeated sampling, 89 
Replicated data, 585-586 
Reproductive property, 155 
Residual sum of squares (RSS), 75, 79 
Residuals, 468 
Restricted F test, 629 
Restricted least squares (RLS), 
265-267 
Restricted residual sum of squares 
(RSS), 272-274 
Ridge regression, 364 
RLS (see Restricted least squares) 
Robust estimation, 338n 
Robust standard errors, 412 
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S 


Sample autocorrelation function 
(SAFC), 792 

Sample correlation coefficient, 82 

Sample correlogram, 792 

Sample covariance, 792 

Sample regression function (SRF), 
46-49 

Sample regression line, 46 

Sample variance, 792 

Sampling, 27 

Sampling distribution, 74n, 77 

Sargan test, 705-706 

Scale factors, 166-169 

Scaling, 166-170 

Scatter diagram (scattergram), 16 

Scatterplot, 359-360 

Schwarz’s information criterion (SIC), 
Sols 

Seasonal analysis, 309-312 

Seasonality, 831 

Second-order autoregressive (AR(2)), 
823 

Second-order moving average (MA(2)), 
823 

Second-order stationary, 784 

Security market line (SML), 160 

Seemingly unrelated regression (SURE) 
model, 630n, 754n, 832n 

Self-selection bias, 573 

Semielasticity, 176 

Semilog models, 175-179 

Semilogarithmic regressions, 314 

Serial correlation, 436-439 

Serial correlation model, 694 

Shocks, 832 

Short panel, 624 

Short-run multiplier, 653 

SIC (see Schwarz’s information 
criterion) 

Simple correlation coefficients, 226-228 

Simple hypothesis, 120 

Simple regression analysis (see Two- 
variable regression analysis) 

Sims test of causality, 686n 

Simultaneity test, 740-742 

Simultaneous-equation bias, 714-719 

Simultaneous-equation methods, 
751-769 

estimation approaches, 751-752 
bias in indirect least-squares 

estimators, 777-778 
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examples, 764-769 
indirect least squares, 755-758 
recursive models and OLS, 
753-755 
standard errors of 2SLS estimators, 
779 
two-stage least squares, 758—764 
Simultaneous-equation models, 709-720 
examples of, 710-715 
nature of, 709-710 
Simultaneous-equation regression 
models, 821 
Single exponential smoothing, 821 
Single-equation methods, 752 
Single-equation model, 3 
Single-equation regression models, 13, 
821 
Size: 
of the statistical test, 116n 
of unit root tests, 801 
Size effect, 24 
Skewness, 138 
Slope, 3, 41 
Slope drifter (see Differential slope 
coefficients) 
SML (security market line), 160 
Spatial autocorrelation, 436 
Spearman’s rank correlation coefficient, 
91 
Spearman’s rank correlation test, 
401-403 
Specification bias, 69 
assumption regarding, 204 
excluded variable, 439 
incorrect function form, 439-440 
and multicollinearity, 362 
in multiple regression, 214-215 
Specification error, 69, 162 
Spline functions, 312 
Spurious correlation, 416 
Spurious regression, 780, 790-791 
Square root transformation, 414 
SRF (see Sample regression function) 
SRM (see Switching regression models) 
St. Louis revised model, 769-770 
Stability condition, 797n 
Standard error(s): 
defined, 74n 
of estimate, 76 
of least-squares estimates, 74-76 
of least-squares estimators, 101 
of OLS estimators, 208-209 
of regression, 76 


in 2SLS estimators, 779 
Standard linear regression model 
(see Classical linear regression 
model) 
Standard normal distribution, 108 
Standardized residuals, 453 
Standardized variables, 170-172 
Statement of theory or hypothesis, 3 
Stationarity, 22 
Stationarity, tests of, 791-797 
autocorrelation function/correlogram, 
792-796 
graphical analysis, 792 
statistical significance of 
autocorrelation coefficients, 
796-797 
Stationary stochastic processes, 784 
Stationary time series, 780 
Statistic (term), 47 
Statistical inference, 7 
Statistical properties, 65, 74 
Statistical relationships, 19 
Statistical significance: 
of autocorrelation coefficients, 
796-797 l 
practical vs; 130-131 
Statistically significant, 122 
Steepest descent method, 557 
Stepwise backward regression, 372 
Stepwise forward regression, 372 
Stochastic (term), 19n, 20 
Stochastic disturbance, 44—46 
Stochastic error term, 44, 186-187 
Stochastic explanatory variables, 534 
Stochastic PRF, 51 
Stochastic processes, 784-787 
integrated, 789-790 
nonstationary, 784-787 
stationary, 783-784 
trend stationary/difference stationary, 
788-789 
unit-root, 787 
Stochastic regressor model, 68 
Stochastic time series, 788 
Stochastic trend, 788 
Stock adjustment model, 666 
Strictly exogenous regressors, 493 
Strictly exogenous variables, 625 
Strictly white noise, 784n 
Structural breaks, 801 
Structural changes, testing for, 275 
Structural coefficients, 727 
Structural equations, 727 


Studentized residuals, 453n 
Student’s ¢ distribution, 820 
Student’s t test, 798 

Survival analysis, 608 

Switching regression models (SRM), 


312n, 318 


Systematic component, 44 


T 


r (subscript), 21 

T (total number of observations), 21 
T ratios, 349, 350 

7 (tau) statistic, 798-799 

T test, 122-126 

Target variable, 8 

Taylor’s series expansion, 557 
Taylor’s theorem, 567-568 
Technology, 657 

“Ten Commandments of Applied 


Econometrics” (Peter Kennedy), 
535 


Test of significance, 122-126 


xX test, 126 

confidence interval vs., 131 

overall (see Overall significance 
testing) 

i test, 122-125 


Test statistic, 122 
Tests of non-nested hypotheses, 511-516 


Davidson—MacKinnon J test, 513-516 
discerning approach, 512~516 
discrimination approach, 512 
non-nested F test, 511-513 


Tests of specification errors, 499-506 
Texas economy application, 836-837 
TGARCH (threshold GARCH), 846 
Theoretical econometrics, 9 
Three-variable regression model: 


adjusted R?, 218-220 

Cobb-Douglas production function, 
220-222 

estimation of partial regression 

. coefficients, 207—212 

example, 212-213 

interpretation of regression equation, 
206 

multiple coefficient of correlation, 212 

multiple coefficient of determination, 
210-211 

notation/assumptions, 203-204 

partial regression coefficients, 
205-207 


specification bias, 214-215 
standardized variables, regression 
on, 213 

Threshold GARCH (TGARCH), 847 

Threshold level, 594 

Time derivative, 754n 

Time effect, 629 

Time sequence plot, 453 

Time series data, 780-81 1 
approaches to, 820-822 
Box-Jenkins methodology, 824-831 
cointegration, 805—808 
and cross-section data, 622 
and cross-sectional data, 361 
defined, 20-21 
economic applications, 808-81 1 
examples of, 843-845 
key concepts with, 782 
modeling, 822-824 


spurious regression phenomenon with, 


790-791 
stationarity, tests of, 791-797 
stochastic processes, 783-790 


transforming nonstationary time series 


to, 802-804 
unit root tests, 797—802 
U.S. economy, 781~782 
vector autoregression, 831-837 
volatility measurement in, 838, 843 
Time series econometrics, 22 
Time-invariant variable, 626, 628 
Time-to-event data analysis, 608 
Time-variant variable, 627 
Tobit model, 602-607 
Tolerance, 358 
Total sum of squares (TSS), 79 
Toxicity study, 617 
Traditional econometric methodology, 
2-3 
Transformation of variables, 362-363 
Trend stationary, 788 
Trend stationary process (TSP), 787 
Trend stationary (TS) stochastic 
processes, 787-789 
Trends, 21 
Trend-stationary processes, 803-804 
Trial-and-error method, 555-556 
Triangular (arithmetic) distributed-lag 
model, 695 
Triangular models, 752, 753n 
Trichotomous variable, 571 
True level of significance, 500 
Truncated sample, 602n 


TS stochastic processes (see Trend 
stationary stochastic processes) 
TSP (trend stationary process), 781 
TSS (total sum of squares), 79 
2SLS (see Two-stage least squares) 
2-t rule of thumb, 127 
Two-sided hypothesis, 121-122 
Two-stage least squares (2SLS), 
758-764 
Two-tail hypothesis test, 121—122 
Two-tail test of significance, 125 
Two-variable linear regression model, 13 
Two-variable regression analysis, 20 
examples of, 49-51 
linearity in, 42—43 
population regression function, 41—42 
sample regression function, 46—49 
stochastic disturbance in, 44—46 
stochastic specification of PRF, 43-45 
Two-variable regression model, 159-188 
elasticity measurement, 172-175 
classical linear regression model, 
67-74 
coefficient of determination 7, 
78-82 
examples, 83-88 
Gauss—Markov theorem, 76-78 
Monte Carlo experiments, 88-89 
ordinary least squares method, 
61-67 
precision/standard errors, 74-76 
functional models of, 171-172 
log-linear model, 172-174 
reciprocal models, 179-184 
selection, 184—185 
semilog models, 175-179 
growth measurement, 175-179 
hypothesis testing, 121-131 
accepting/rejecting hypothesis, 127 
choosing level of significance, 129 
confidence-interval approach, 
121-122 
exact level of significance, 129-130 
forming null/alternative hypotheses, 
128 
selection of method, 131 
statistical vs. practical significance, 
130-131 
test-of-significance approach, 
122-126 
zero null hypothesis/2-t rule, 
127-128 
hypothetical example of, 38—41 
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interval estimation, 115-120 
confidence intervals, 117-120 
statistical prerequisites, 115 

regression through the origin, 

159-166 
and scaling/units of measurement, 
167-170 
on standardized variables, 170-171 
and stochastic error, 186—187 
Two-way fixed effects model, 629 
Type I error, 116n, 121n 
Type Il error, 129 


U 


Unbalanced panel, 25 
Unbiasedness, 546-547 
assumption regarding, 204 
of BLUE, 77 
of least-squares estimators, 100 
Unconditional expected value, 39 
Underdifferencing, 804 
Underfitting, of model, 495-497 
Underidentification, 729-731 
Underprediction, 7 
Ungrouped data, 590-594 
Unit change in value of regressor in, 213 
Unit root problem, 787 
Unit root stochastic processes, 787 
Unit root tests: 
augmented Dickey—Fuller test, 
800 
critique, 801-802 
F test, 801 
Phillips—Perron, 801 
structural changes testing, 801 
time series data, 797-802 
Units of measurement, 176 
Universal regression, law of, 15 
University of Michigan, 21 
Unobservable variable, 634 
Unobserved effect, 626 
Unrestricted residual sum of squares 
(RSSyp), 273-274 
Upper confidence limit, 116 
Upward trend, 177 
U.S. Census Bureau, 21 
U.S. Department of Commerce, 
22 
U.S. economic time series, 781-782 
U.S. inflation rate, 844-845 
U.S. Treasury bills examples, 
810-811 
Utility index, 594 
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Vagueness, of theory, 45 
Validity, of instruments, 705-706 
VAR model (see Vector autoregression 
model) 
Variables: 
dropping, 362 
measurement scales of, 27-28 
standardized, 198 
transformation of, 362-363 
Variance: 
of individual prediction, 157-158 
of least-squares estimators, 101—102 
of mean prediction, 157—158 
of OLS estimators, 208-209 
variation vs., 79n 
Variance-inflating factor (VIF), 347 
Variation, variance vs., 79n 
Vector autoregression (VAR) model, 687 
causality, 834 
estimation, 832-834 
forecasting, 834 
problems with, 834-836 
Texas economy application, 836—837 
time series data, 831—837 
Venn diagram, 78 
VIF (see Variance-inflating factor) 


Volatility, 838 
Volatility clustering, 820 
Volatility measurement: 
ARCH presence, 842 
Durbin—Watson d and ARCH effect, 
842-843 
in financial time series, 838-843 
GARCH model, 843 
NYSE price changes example, 
841-842 
U.S./U.K. exchange rate example, 
838-841 
Von Neumann ratio, 476 
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Wald test, 275-276, 315n 

Weakly exogenous regressors, 493 

Weakly stationary, 784 

Weekly data, 21 

Weierstrass’ theorem, 680 

Weighted least squares (WLS), 394 

WG estimator (see Within-group 
estimator) 

White noise error, 443 

White noise process, 784 


White’s general heteroscedasticity test, 
406-410 

White’s heteroscedasticity-consistent 
standard errors, 412 

Wide sense, stochastic process, 783 

Wiener—Granger causality test, 687n 

Within-group (WG) estimator, 630-633 

WLS (see Weighted least squares) 

WLS estimators, 394 
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X (explanatory variable), 21 
assumption on nature of, 73 
independence of, 68 
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Y (dependent variable), 20 
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Zellner SURE estimation technique, 
754n 

Zero contemporaneous correlation, 753 

Zero correlation, 82 

Zero mean value of u; (assumption 3), 
69-70 

Zero null hypothesis, 127 

Zero-intercept model, 161—162 
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BASIC. 
ECONOMETRICS 


The fifth edition of Basic Econometrics continues to blend foundations of 
econometrics with up-to-date research. It illustrates important concepts 
through intuitive and informative examples & data without resorting to matrix i 
algebra, calculus or statistics beyond the elementary level. It presents not only 
the ‘what’ and the ‘how’ of econometrics, but also the ‘why’ and successfully 
provides thorough yet highly lucid descriptions of all the key econometric topics. 
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